Closed liuyipei closed 8 years ago
It turns out that I needed to reduce the learning rate. After reducing the learning rate by 10x and increasing the effective batch size by 2x, I was able to train from scratch. Less extreme measures are most likely sufficient.
@ducha-aiki thanks. LSUV does seem to have a slightly faster start; in this case, my biggest problem was the learning rate.
@liuyipei with LSUV I was able to converge with big lr. But it is good, that other ways work as well :) See https://github.com/ducha-aiki/caffenet-benchmark/blob/master/prototxt/architectures/SqueezeNet128_lsuv.prototxt
I like how you have trainval and solver in one file. Does Caffe accept that as-is, or did you customize Caffe to allow it?
Anyway, it looks convenient! On Mar 3, 2016 8:50 PM, "Dmytro Mishkin" notifications@github.com wrote:
@liuyipei https://github.com/liuyipei with LSUV I was able to converge with big lr. But it is good, that other ways work as well :) See https://github.com/ducha-aiki/caffenet-benchmark/blob/master/prototxt/architectures/SqueezeNet128_lsuv.prototxt
— Reply to this email directly or view it on GitHub https://github.com/DeepScale/SqueezeNet/issues/4#issuecomment-192104490.
@forresti it accepts, see example in caffe master branch: https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_consolidated_solver.prototxt
@liuyipei One more thing: I've run into a few problems with cuDNN and numerical correctness. I recommend trying a training run with cuDNN disabled, and seeing if you still get divergence.
@liuyipei Update: We have been experimenting with solver configurations, and we have identified a configuration that converges more reliably. We just committed it to SqueezeNet-master: 0bc03d9676fde79e4688ebba8b0d3a0e0c2c41da
This work is very exciting! The provided weights does work as expected. The prototxt works out of the box with the default ilsvrc2012 lmdb data that came with caffe's examples.
However, my training loss from scratch has not decreased even after the full 85k iterations. I tried rebuilding the latest version of caffe, running a second time, and increasing the batch size by 4x: none of these attempts seemed to help. Am I correct in understanding that the model is meant to be trained end-to-end without tricks like layer-by-layer training or anything like that?
To help me diagnose my problem, would it be possible for you to provide a reference set of initialization weights caffemodel (or/and one of your earliest intermediate snapshots)?
Thank you for your help!