Initialization weights - Githubissues

forresti / SqueezeNet

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters

BSD 2-Clause "Simplified" License

2.17k stars 723 forks source link

Initialization weights #4

Closed liuyipei closed 8 years ago

liuyipei commented 8 years ago

This work is very exciting! The provided weights does work as expected. The prototxt works out of the box with the default ilsvrc2012 lmdb data that came with caffe's examples.

However, my training loss from scratch has not decreased even after the full 85k iterations. I tried rebuilding the latest version of caffe, running a second time, and increasing the batch size by 4x: none of these attempts seemed to help. Am I correct in understanding that the model is meant to be trained end-to-end without tricks like layer-by-layer training or anything like that?

To help me diagnose my problem, would it be possible for you to provide a reference set of initialization weights caffemodel (or/and one of your earliest intermediate snapshots)?

Thank you for your help!

ducha-aiki commented 8 years ago

Try LSUV init https://github.com/ducha-aiki/LSUVinit/blob/master/tools/extra/lsuv_init.py

liuyipei commented 8 years ago

It turns out that I needed to reduce the learning rate. After reducing the learning rate by 10x and increasing the effective batch size by 2x, I was able to train from scratch. Less extreme measures are most likely sufficient.

@ducha-aiki thanks. LSUV does seem to have a slightly faster start; in this case, my biggest problem was the learning rate.

ducha-aiki commented 8 years ago

@liuyipei with LSUV I was able to converge with big lr. But it is good, that other ways work as well :) See https://github.com/ducha-aiki/caffenet-benchmark/blob/master/prototxt/architectures/SqueezeNet128_lsuv.prototxt

forresti commented 8 years ago

I like how you have trainval and solver in one file. Does Caffe accept that as-is, or did you customize Caffe to allow it?

Anyway, it looks convenient! On Mar 3, 2016 8:50 PM, "Dmytro Mishkin" notifications@github.com wrote:

@liuyipei https://github.com/liuyipei with LSUV I was able to converge with big lr. But it is good, that other ways work as well :) See https://github.com/ducha-aiki/caffenet-benchmark/blob/master/prototxt/architectures/SqueezeNet128_lsuv.prototxt

— Reply to this email directly or view it on GitHub https://github.com/DeepScale/SqueezeNet/issues/4#issuecomment-192104490.

ducha-aiki commented 8 years ago

@forresti it accepts, see example in caffe master branch: https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_consolidated_solver.prototxt

forresti commented 8 years ago

@liuyipei One more thing: I've run into a few problems with cuDNN and numerical correctness. I recommend trying a training run with cuDNN disabled, and seeing if you still get divergence.

forresti commented 8 years ago

@liuyipei Update: We have been experimenting with solver configurations, and we have identified a configuration that converges more reliably. We just committed it to SqueezeNet-master: 0bc03d9676fde79e4688ebba8b0d3a0e0c2c41da