BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.08k stars 18.7k forks source link

bug in sparse GaussianFiller #1497

Closed denizyuret closed 9 years ago

denizyuret commented 9 years ago

Here is the buggy code from filler.hpp line 77. As mentioned in the comment, the number of inputs is taken from blob->height() when it actually should be blob->width(). I first noticed this error when I tried to set sparse to 200 in a 1000x100x3 regular feedforward linear network with a single hidden layer. Caffe complained that the bernoulli probability (calculated below as 200/100) was above 1.0.

if (sparse >= 0) {
  // Sparse initialization is implemented for "weight" blobs; i.e. matrices.                                           
  // These have num == channels == 1; height is number of inputs; width is                                             
  // number of outputs.  The 'sparse' variable specifies the mean number                                               
  // of non-zero input weights for a given output.                                                                     
  CHECK_EQ(blob->num(), 1);
  CHECK_EQ(blob->channels(), 1);
  int num_inputs = blob->height();
  Dtype non_zero_probability = Dtype(sparse) / Dtype(num_inputs);
  rand_vec_.reset(new SyncedMemory(blob->count() * sizeof(int)));
  int* mask = reinterpret_cast<int*>(rand_vec_->mutable_cpu_data());
  caffe_rng_bernoulli(blob->count(), non_zero_probability, mask);
  for (int i = 0; i < blob->count(); ++i) {
    data[i] *= mask[i];
  }
}
shelhamer commented 9 years ago

@jeffdonahue could you check this?

jeffdonahue commented 9 years ago

Thanks for reporting this @denizyuret. You're correct, I screwed that up, height is indeed the number of outputs. But when I fixed it by changing to width, the MNIST autoencoder example (./examples/mnist/train_mnist_autoencoder.sh), currently the only official piece of Caffe that uses it, worked substantially worse (at least with 10k iterations of training, maybe would've converged to same loss in the end). So, instead of actually changing the behavior, I just fixed the comments to match the current behavior (1304173). Of course I could update the sparse: N value in each layer of the MNIST example to get the exact old behavior, but that's more work than I care to do now, and if it screwed up the MNIST training as much as it did, it might also screw up users who have tuned the parameter according to the buggy behavior in their private Caffe use. Really I should just have made the parameter a float sparse_p (a probability in [0,1]) instead of the whole divide by #inputs/outputs nonsense...