forresti / SqueezeNet

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
BSD 2-Clause "Simplified" License
2.17k stars 723 forks source link

SqueezeNet benchmark #3

Closed ducha-aiki closed 7 years ago

ducha-aiki commented 8 years ago

Hi,

SqueezeNet is really cool architecture! I have added it to my caffenet-variants benchmark and it looks even better than caffenet. https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Architectures.md

Name Accuracy LogLoss Comments
CaffeNet128-2048 0.470 2.36 Pool5 = 3x3,fc6-fc7=2048
CaffeNet128-4096 0.497 2.24 Pool5 = 3x3, fc6-fc7=4096
SqueezeNet128 0.530 2.08 Reference SqueezeNet solver, but linear lr_policy and batch_size=256 (320K iters)
SqueezeNet128+ELU 0.555 1.95 Reference solver, but linear lr_policy and batch_size=256 (320K iters).ELU

Note, that because of speed reasons, I use image size = 128 px, so performances of all nets are degraded compared to classical 227px.

I`d like to suggest a bit different solver setup for SqueezeNet. According to my tests on caffenet128, linear lr_policy works better, than squared, as in your solver: https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Lr_policy.md

Name Accuracy LogLoss Comments
Step 100K 0.470 2.36 Default caffenet solver, max_iter=320K
Poly lr, p=0.5, sqrt 0.483 2.29 bvlc_quick_googlenet_solver, All the way worse than "step", leading at finish
Poly lr, p=2.0, sqr 0.483 2.299
Poly lr, p=1.0, linear 0.493 _2.24_

Best regards, Dmytro.

forresti commented 8 years ago

Very cool!

ducha-aiki commented 8 years ago

Added SqueezeNet + ELU instead of ReLU

gaush123 commented 8 years ago

Hi, I'd like to test this model. How may I get/generate deploy.prototxt file for the same?

forresti commented 8 years ago

@gaush123 There's a PR for a deploy.prototxt here: https://github.com/DeepScale/SqueezeNet/pull/2 (link)

We still need to do our own sanity-check on this deploy.prototxt, but it looks right to me.

gaush123 commented 8 years ago

@forresti One more question that I do have is, why there is no kernel size mention in the "pool 10" layer, the layers which is directly connected to the softmax layer at the top most.

forresti commented 8 years ago

@gaush123 In pool10, we use globalpool: True. In Caffe, globalpool means to set the kernel size equal to the size of the input data. So, if conv10 outputs 13x13xChannels, then pool10 has a 13x13 kernel.

This is a nice bit of flexibility -- it allows you to input various sizes of input images, and the CNN will still produce a 1x1x1000 classification vector.

gaush123 commented 8 years ago

Thanks @forresti One more, when I download this model it shows its type as 'pcx' image, while other standard models are in binary txt format. Is it possible to do any kind of conversions here?

forresti commented 8 years ago

see #5.

ducha-aiki commented 8 years ago

@forresti now we have released the tech report http://arxiv.org/abs/1606.02228 so you can cite linear lr_policy ;) Also, hope than you can adopt some other stuff from to the squeezenet.

forresti commented 8 years ago

@ducha-aiki Great! We have an other upcoming publication and we will cite this!