Batch Normalization Issue in SegNet

yifaliu commented 7 years ago

We had some batch normalization issues in the Segnet architecture. We hope to get some thoughts on this. The BN issues we are facing are summarized and can be reached at:

http://tier2.ihepa.ufl.edu/~skhirtladze/Segnet_bn_issue.pdf

Here is a summary of our concerns and observations…

We tested Segnet architecture with batch normalization layer (BN layer) from Segnet Caffe in slide 1 and slide 2 attached. We have observed some significant differences in the validation (or test) step with and without “bn_mode: INFERENCE” line in the inference.prototxt. As you can see from the slides (above link), it works much better when this line is commented out. Does this mean that our result contradicts with the ones in Segnet official website? What does the 'bn_model: INFERENCE' in the inference prototxt actually mean?
We also tested the Segnet architecture with batch normalization layer (BatchNorm) and Scale layer from official Caffe, basically we replaced BN layer from Segnet Caffe with BatchNorm and Scale layers from official Caffe in both training.prototxt and inference.prototxt. However, the results didn’t seem to make sense, as shown in slide 3 and 4 (above link). We believe that the BN layer in Segnet Caffe is equivalent with BatchNorm layer + Scale layer in official Caffe, is it correct? We hoped that we would converge in both cases but this doesn’t seem to be the case; the results are not even comparable.
We also tested the Segnet architecture without any batch normalization layer in slide 5 (click above link). Even the per-class accuracy can reach 90% for both classes in our dataset, the prediction looks weird (see slide 5). As we understand, the batch normalization is helpful for the network convergence and can avoid vanishing gradient during weights update. It looks like there is no vanishing gradient problem when we removed all the batch normalization layers in the architecture because the per-class accuracy is increased without any problem. Furthermore, we increased the training iteration to be on the safe side, but without any success.

Hope to get some thoughts on this issue.

TimoSaemann commented 7 years ago

Hi Yi,

A few thoughts from my side:

When the line is commented out, the default bn_mode is active which is "LEARN". In this mode it normalizes the layer inputs for each training batch to address the internal covariate shift (cf. [1]). This mode is only intended for the training. For testing purposes you need the bn_mode "INFERENCE". Before this can work, however, you need to run the compute_bn_statistics.py script. What the script basically does is the following:

It computes the average mean (Ex) and variance (varx) over the complete trainings data set
It computes the new scale (gamma) and shift (beta) based on the mean, variance and the learned scale and shift parameters. The way it does is the following:

$\gamma _{new} = \frac{\gamma}{\sqrt{varx + \epsilon}}$ $\beta _{new} = \beta - \frac{\gamma * Ex}{\sqrt{varx + \epsilon}}$ For reference, gamma and beta are kept in the layer's two blobs: (0) gamma, (1) beta

If you have given the new gamma and beta, you can easily calculate the output y for the input x (this is what the bn_mode "INFERENCE" does):

$y \leftarrow \gamma _{new} * x + \beta _{new}$

I think the reason why you get significantly worse results with bn_mode: INFERENCE is because you haven't run the compute_bn_statistics.py script, have you?

SegNet's BN layer and the BatchNorm + Scale layer from the BVLC/Caffe are basically the same. A small difference is that in SegNet the global statistics (mean, variance) are calculated offline with the compute_bn_statistics.py script. In BVLC/Caffe's BatchNorm layer, the network is computing global mean/variance statistics via a running average during training time (toggle the flag _use_globalstats when switching from training to testing). Don't know why your results are so bad. Actually, they should be very similar. Maybe you still need to set _lrmult: 0 ??

I also want to add that for inference batch normalization layer can be merged into convolutional kernels, to speed up the network. Both layers applies a linear transformation. For that reason the batch normalization layer can be absorbed in the previous convolutional layer by modifying its weights and biases. In doing so you can speed up SegNet by around 30 % without a drop in quality. In fact, I plan to implement such a "bn-absorber" for SegNet. Maybe I'll publish it on GitHub soon.

[1] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift." arXiv preprint arXiv:1502.03167 (2015).

Best, Timo

yifaliu commented 7 years ago

@TimoSaemann Thank you very much for your response When we run the compute_bn_statistic.py and do the prediction with bn_mode: INFERENCE included in the inference.prototxt, the predicted result is very bad and looks weird. The predicted image is similar with the one shown in slide 5 (http://tier2.ihepa.ufl.edu/~skhirtladze/Segnet_bn_issue.pdf) In our test, the predicted results only look reasonable when we comment 'bn_mode: INFERENCE' out from the inference.prototxt. From my understanding, the inference network will use gamma and beta directly from the learning network if we comment 'bn_mode: INFERENCE' out. Is it correct? Thanks, Timo.

TimoSaemann commented 7 years ago

Yes that's correct.

The behaviour you describe I've already noticed when I trained with 'bn_mode: INFERENCE'. However, this does not lead to the best possible quality.

I also want to mention that if you comment the 'bn_mode: INFERENCE' during testing, the inference time increases significantly.

yifaliu commented 7 years ago

Thank you, Timo When we increase the value of batch size and iteration number for SegNet training, then the prediction results become the ones we expected and everything turns to be normal (In this experiment, we set batch_size as 4 and max_iter as 5000 while these values are 1 and 2000 in the previous experiments)

It looks that we didn't train the network well in the previous simulations. However, one thing still puzzle me is that why the prediction results (see slide 1 http://tier2.ihepa.ufl.edu/~skhirtladze/Segnet_bn_issue.pdf) are very good (almost all images are predicted good) when we commented 'bn_mode: INFERENCE' out in inference.prototxt in our previous experiment? If the network is not trained well, it is not expected to get good results in any conditions. Timo, do you have any comments about this point?

TimoSaemann commented 7 years ago

I am glad to hear that it works as expected.

Btw.: I have implemented the script I have talked about in my first post. You are welcome to try it out: https://github.com/TimoSaemann/caffe-segnet-cudnn5/blob/master/scripts/BN-absorber.py

And I also created a pull request in SegNet-Tutorial repository.

cacgs commented 7 years ago

@yifaliu did you find any answers about your second question?

(...) However, one thing still puzzle me is that why the prediction results (see slide 1 http://tier2.ihepa.ufl.edu/~skhirtladze/Segnet_bn_issue.pdf) are very good (almost all images are predicted good) when we commented 'bn_mode: INFERENCE' out in inference.prototxt in our previous experiment? If the network is not trained well, it is not expected to get good results in any conditions. Timo, do you have any comments about this point?

Ai-is-light commented 6 years ago

@Tim I want to know the setting bn_mode="LEARN“ is the customized by yourself? I mean, could I use the bn_mode="LEARN" in the official CAFFE?@ @ thanks

Ai-is-light commented 6 years ago

@TimoSaemann I want to know the setting bn_mode="LEARN“ is the customized by yourself? I mean, could I use the bn_mode="LEARN" in the official CAFFE?@ @ thanks

alexgkendall / caffe-segnet

Batch Normalization Issue in SegNet #109