SqueezeNet out of memory with batch size (512) smaller than AlexNet (1024)

forresti / SqueezeNet

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters

BSD 2-Clause "Simplified" License

2.18k stars 723 forks source link

SqueezeNet out of memory with batch size (512) smaller than AlexNet (1024) #19

Closed wenwei202 closed 8 years ago

wenwei202 commented 8 years ago

Hi, it's magic to see squeeze the parameters so much, great work. Two issues when I "caffe time" the model in Titan X:

1) SqueezeNet is slower than AlexNet with the same batch size of 256; SqueezeNet:

I0722 09:33:39.867264 18424 caffe.cpp:377] Average Forward pass: 128.444 ms. I0722 09:33:39.867269 18424 caffe.cpp:379] Average Backward pass: 307.341 ms. I0722 09:33:39.867275 18424 caffe.cpp:381] Average Forward-Backward: 436.085 ms.

AlexNet: I0722 09:34:11.348625 18438 caffe.cpp:377] Average Forward pass: 91.4737 ms. I0722 09:34:11.348630 18438 caffe.cpp:379] Average Backward pass: 175.433 ms. I0722 09:34:11.348635 18438 caffe.cpp:381] Average Forward-Backward: 267.041 ms.

2) SqueezeNet out of memory with batch size (512) smaller than AlexNet (1024)

Did I do something wrong or these are the issues after increasing the # of layers?

Thank you so much.

Grabber commented 8 years ago

Maybe it is caused because of concat layers?

dgschwend commented 8 years ago

You can compare the networks using Netscope here: AlexNet, SqueezeNet v1.0, SqueezeNet v1.1

This table gives a summary:

comparison_cnns

Problem 2) is definitely because SqueezeNet uses more activation memory than AlexNet (5x as much in v1.0 and 3x as much in v1.1). The largest output feature map in SqNet v1.1 is 817k pixels, in AlexNet 290k pixels.

Problem 1) could have different reasons:

SqueezeNet v1.0 has approximately the same computational complexity as AlexNet (860 million vs. 1140 million multiply-accumulate operations), SqueezeNet v1.1 should be less complex (390 million MACCs).
SqueezeNet has many more layers than AlexNet (26 CONV layers versus 5 CONV layers), and switching the layers takes some time on the CPU and GPU
SqueezeNet has larger output feature maps (compare Problem 2 above). Moving this data around between GPU and Memory takes some time, too.

So I don't think you're doing anything wrong, this is probably the price you have to pay for the increased network complexity. I you haven't done already, definitely give SqueezeNet v1.1 a try!

wenwei202 commented 8 years ago

@dgschwend Thank u so much for so many details. The Netscope is such a cool tool! I was using v1.1. I guess the problem comes from the increased activations and layers as you pointed and the concat layers as @Grabber mentioned.

dgschwend commented 8 years ago

You are welcome, glad I could help! ;)

argman commented 8 years ago

@dgschwend , is there any document about how to compute the macc, memory of a network? tks

dgschwend commented 8 years ago

You can use the Netscope CNN Analyzer tool: https://dgschwend.github.io/netscope/ The actual code for the calculation is here: https://github.com/dgschwend/netscope/blob/gh-pages/src/analyzer.coffee The formulas are pretty self-explanatory.

forresti commented 8 years ago

@dgschwend Nice work on that table. Is "Table 4.1" part of a longer paper that I could read (and cite)?

dgschwend commented 8 years ago

@forresti Yes, I just published my Master Thesis.

The project report and all code for "ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network" are public. The whole work is based on SqueezeNet, and the report contains a detailed analysis + comparison of prior CNN topologies.

davidbrai commented 7 years ago

@dgschwend hi, can you share the reference to "SqueezeNet++" in your table? couldn't find it

dgschwend commented 7 years ago

That's the work-in-progress codename of ZynqNet CNN from my Master Thesis (see post above). See https://github.com/dgschwend/zynqnet

mrgloom commented 5 years ago

Have the same issue that SqeezeNet v1.1 is slower then AlexNet, i.e. 3.91 ms vs 3.01 ms. https://github.com/mrgloom/kaggle-dogs-vs-cats-solution