BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.06k stars 18.7k forks source link

A doubt about scale layer's backward (may be bug) #6604

Open huchhong opened 5 years ago

huchhong commented 5 years ago

Issue summary

I read scale layer code recently. I found some suspicious code. Here is it:

template <typename Dtype>
void ScaleLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
   ...
  else {
        const Dtype* sum_mult = sum_multiplier_.cpu_data();
        sum_result = (outer_dim_ == 1) ?
            scale->mutable_cpu_diff() : sum_result_.mutable_cpu_data();
        caffe_cpu_gemv(CblasNoTrans, sum_result_.count(), inner_dim_,
                       Dtype(1), product, sum_mult, Dtype(0), sum_result);
      }
      if (outer_dim_ != 1) {
      ...
}

In the above code, if outer_dim == 1, then scale_diff will be replaced instead of adding by caffe_cpu_gemv's result since BETA param of gemv is zero. This seems wrong. The same happens in gpu version.

Steps to reproduce

Just code review.

Tried solutions

System configuration

Issue checklist

huchhong commented 5 years ago

I have tried this compare:

  1. set train batch to 1 and iter_size to 2
  2. set train batch to 2 and iter_size to 1

the input data is set to one single image, so theoretically, these two test should give the same scale diff, but it don't.