Closed yuzhegao closed 5 years ago
@yuzhegao
I had double checked your question.
Firstly, it is my negligence for any inconvenience. I had realized this when I implemented AL.
But, I actually use a normalization of "BATCH_SIZE" in class_balanced_sigmoid_cross_entropy_attention_loss_layer.cu
and paper.
The definition of LossParameter in caffe.proto
:
// Message that stores parameters shared by loss layers
message LossParameter {
// If specified, ignore instances with the given label.
optional int32 ignore_label = 1;
// How to normalize the loss for loss layers that aggregate across batches,
// spatial dimensions, or other dimensions. Currently only implemented in
// SoftmaxWithLoss and SigmoidCrossEntropyLoss layers.
enum NormalizationMode {
// Divide by the number of examples in the batch times spatial dimensions.
// Outputs that receive the ignore label will NOT be ignored in computing
// the normalization factor.
FULL = 0;
// Divide by the total number of output locations that do not take the
// ignore_label. If ignore_label is not set, this behaves like FULL.
VALID = 1;
// Divide by the batch size.
BATCH_SIZE = 2;
// Do not normalize the loss.
NONE = 3;
}
// For historical reasons, the default normalization for
// SigmoidCrossEntropyLoss is BATCH_SIZE and *not* VALID.
optional NormalizationMode normalization = 3 [default = VALID];
// Deprecated. Ignored if normalization is specified. If normalization
// is not specified, then setting this to false will be equivalent to
// normalization = BATCH_SIZE to be consistent with previous behavior.
optional bool normalize = 2;
}
and the assignment of normalization_
in void ClassBalancedSigmoidCrossEntropyAttentionLossLayer<Dtype>::LayerSetUp
is :
if ( this->layer_param_.loss_param().has_normalization() ) {
normalization_ = this->layer_param_.loss_param().normalization();
}
else if ( this->layer_param_.loss_param().has_normalize() ) {
normalization_ = this->layer_param_.loss_param().normalize() ?
LossParameter_NormalizationMode_VALID : LossParameter_NormalizationMode_BATCH_SIZE;
}
else {
normalization_ = LossParameter_NormalizationMode_BATCH_SIZE;
}
and the definition of AL is:
layer {
name: "edge_loss"
type: "ClassBalancedSigmoidCrossEntropyAttentionLoss"
bottom: "unet1b_edge"
bottom: "label_edge"
top: "edge_loss"
loss_weight: 1.0
attention_loss_param {
beta: 4.0
gamma: 0.5
}
}
As you see, I do not set both normalization
and normalize
in the definition of AL. So the code normalization_ = LossParameter_NormalizationMode_BATCH_SIZE;
will be invoked.
In summary, the code implementation and paper description are the same.
Thank you for reading my paper.
It's my mistake and now I got it, really thanks for your answer! :)
Hi ! When I read the code of class_balanced_sigmoid_cross_entropy_attention_loss_layer.cu to figure out the implement detail of this loss, I just found that you use a normalization of "FULL" (sum up the loss and then divide by NHW). But in your paper, you just use a normalization of "BATCH_SIZE" (sum up the loss and then divide by batch_size). So could you please tell me whether methods is proper? Thanks!