About the normalization of loss

yuzhegao commented 5 years ago

Hi ! When I read the code of class_balanced_sigmoid_cross_entropy_attention_loss_layer.cu to figure out the implement detail of this loss, I just found that you use a normalization of "FULL" (sum up the loss and then divide by NHW). But in your paper, you just use a normalization of "BATCH_SIZE" (sum up the loss and then divide by batch_size). So could you please tell me whether methods is proper? Thanks!

GuoxiaWang commented 5 years ago

@yuzhegao

I had double checked your question.

Firstly, it is my negligence for any inconvenience. I had realized this when I implemented AL.

But, I actually use a normalization of "BATCH_SIZE" in class_balanced_sigmoid_cross_entropy_attention_loss_layer.cu and paper.

The definition of LossParameter in caffe.proto:

// Message that stores parameters shared by loss layers
message LossParameter {
  // If specified, ignore instances with the given label.
  optional int32 ignore_label = 1;
  // How to normalize the loss for loss layers that aggregate across batches,
  // spatial dimensions, or other dimensions.  Currently only implemented in
  // SoftmaxWithLoss and SigmoidCrossEntropyLoss layers.
  enum NormalizationMode {
    // Divide by the number of examples in the batch times spatial dimensions.
    // Outputs that receive the ignore label will NOT be ignored in computing
    // the normalization factor.
    FULL = 0;
    // Divide by the total number of output locations that do not take the
    // ignore_label.  If ignore_label is not set, this behaves like FULL.
    VALID = 1;
    // Divide by the batch size.
    BATCH_SIZE = 2;
    // Do not normalize the loss.
    NONE = 3;
  }
  // For historical reasons, the default normalization for
  // SigmoidCrossEntropyLoss is BATCH_SIZE and *not* VALID.
  optional NormalizationMode normalization = 3 [default = VALID];
  // Deprecated.  Ignored if normalization is specified.  If normalization
  // is not specified, then setting this to false will be equivalent to
  // normalization = BATCH_SIZE to be consistent with previous behavior.
  optional bool normalize = 2;
}

and the assignment of normalization_ in void ClassBalancedSigmoidCrossEntropyAttentionLossLayer<Dtype>::LayerSetUp is :

  if ( this->layer_param_.loss_param().has_normalization() ) {
      normalization_ = this->layer_param_.loss_param().normalization();
  }
  else if ( this->layer_param_.loss_param().has_normalize() ) {
      normalization_ = this->layer_param_.loss_param().normalize() ?
      LossParameter_NormalizationMode_VALID : LossParameter_NormalizationMode_BATCH_SIZE;
  }
  else {
      normalization_ = LossParameter_NormalizationMode_BATCH_SIZE;
  }

and the definition of AL is:

layer {
  name: "edge_loss"
  type: "ClassBalancedSigmoidCrossEntropyAttentionLoss"
  bottom: "unet1b_edge"
  bottom: "label_edge"
  top: "edge_loss"
  loss_weight: 1.0
  attention_loss_param {
    beta: 4.0
    gamma: 0.5
  }
}

As you see, I do not set both normalization and normalize in the definition of AL. So the code normalization_ = LossParameter_NormalizationMode_BATCH_SIZE; will be invoked.

In summary, the code implementation and paper description are the same.

Thank you for reading my paper.

yuzhegao commented 5 years ago

It's my mistake and now I got it, really thanks for your answer! :)

GuoxiaWang / DOOBNet

About the normalization of loss #4