A doubt about scale layer's backward (may be bug) #6604

Open huchhong opened 5 years ago

huchhong commented 5 years ago

Issue summary

I read scale layer code recently. I found some suspicious code. Here is it:

template <typename Dtype>
void ScaleLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
  else {
        const Dtype* sum_mult = sum_multiplier_.cpu_data();
        sum_result = (outer_dim_ == 1) ?
            scale->mutable_cpu_diff() : sum_result_.mutable_cpu_data();
        caffe_cpu_gemv(CblasNoTrans, sum_result_.count(), inner_dim_,
                       Dtype(1), product, sum_mult, Dtype(0), sum_result);
      if (outer_dim_ != 1) {

In the above code, if outer_dim == 1, then scale_diff will be replaced instead of adding by caffe_cpu_gemv's result since BETA param of gemv is zero. This seems wrong. The same happens in gpu version.

huchhong commented 5 years ago

I have tried this compare:

  1. set train batch to 1 and iter_size to 2
  2. set train batch to 2 and iter_size to 1

the input data is set to one single image, so theoretically, these two test should give the same scale diff, but it don't.