forresti / SqueezeNet

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
BSD 2-Clause "Simplified" License
2.17k stars 723 forks source link

regarding performance improvement for AlexNet #16

Closed williamjames1 closed 8 years ago

williamjames1 commented 8 years ago

Hi, I want to know the performance improvement for SqueezeNet with reference to AlexNet. Any idea?

Thanks. William. J.

forresti commented 8 years ago

When you say "performance," do you mean "speed," "accuracy," or something else?

williamjames1 commented 8 years ago

yes basically I want to understand time required for processing one image using original Alexnet and after using SqueezeNet at more or less same accuracy (using GPU-cudnn).

Grabber commented 8 years ago

@forresti that's an extremely useful benchmark, when we are talking about embedding CNNs into small devices it is not only about shrinking the model size but aggressively reducing the number of computations per frame.

I really think Squeezenet is on the way, but there are no numbers about it...

dgschwend commented 8 years ago

You might be interested in my CNN analysis tool at http://dgschwend.github.io/netscope. The web-based tool allows you to calculate the total number of operations, weights, and activation memory needed for each layer in a given caffe network. Squeezenet v1.0, SqueezeNet v1.1 and AlexNet are included as presets.

Processing time should be more or less proportional to the number of Multiply-Accumulate Operations. In embedded systems, the intermediate memory needed for the activations/feature maps is probably also relevant. Squeezenet v1.1 is definitely an improvement there. Here's a summary:

CNN  #MACC Operations  #Weights  #Activations
AlexNet 1140M 62.37M 2.39M
Squeezenet v1.1 388M 1.23M 7.84M
Squeezenet v1.0 861M 1.24M 12.73M
Inception v3 3230M 23.83M 18.51M
GoogleNet 1600M 6.99M 10.37M
VGG-16 16360M 169.8M 30.06M

Edit: added some other well-known CNNs for comparison. all input crops = 227x227x3 Edit2: Fixed MACCs for VGG-16

forresti commented 8 years ago

@dgschwend Very nice! I have been using Netscope.

If I remember correctly, GoogLeNet-v1 has ~10x fewer MACCs than VGG-19. Could I be wrong about that?

dgschwend commented 8 years ago

@forresti You're right, I somehow missed one digit... fixed!

Grabber commented 8 years ago

@dgschwend That is a must have tool, I was exactly thinking about an oscilloscope for CNNs! Do you think your tool could generate visual representation about layers too?

On Darknet framework there is a feature called visualize that generates visual representations of the filters, layer by layer, take a look:

slack for ios upload-2

It would be useful to have this kind of visual representation rendered on the menu that is shown when you pass the mouse over the network layer.

banana77

dgschwend commented 8 years ago

@Grabber This gets a little bit off-topic, maybe we can move the discussion to my netscope repository? https://github.com/dgschwend/netscope/issues/1

forresti commented 8 years ago

@dgschwend BTW, I noticed you've run Netscope on Inception-v3. Do you have Caffe config files for Inception-v3? (And, better yet... a working training protocol for Inception-v3?)

http://dgschwend.github.io/netscope/#/preset/inceptionv3

dgschwend commented 8 years ago

@forresti You can view and edit the ".prototxt" content by clicking on the "(edit)" link near the network title (http://dgschwend.github.io/netscope/#/preset/inceptionv3)

The original model is from https://github.com/smichalowski/google_inception_v3_for_caffe, but I never tried training it.

forresti commented 8 years ago

@dgschwend Got it! Thanks a lot!

psyhtest commented 8 years ago

@williamjames1 @Grabber @dgschwend @forresti

By the way, we have recently released CK-Caffe, a framework for collaborative performance analysis and optimisation of Caffe across multiple platforms, libraries, models, etc.

For example, this Jupyter notebook compares the best performance per image across 4 CNNs and 4 BLAS libraries on a Samsung Chromebook 2 platform. When using OpenBLAS, SqueezeNet 1.1 is 2 times faster than SqueezeNet 1.0 and 2.4 times faster than AlexNet, broadly in line with expectations set by the SqueezeNet paper.

We also have comparisons for other platforms, models and optimisations. (We are discussing with our customer what and when we can release in addition to the core CK-Caffe framework.)

In addition, we are working on an engine for crowdsourcing benchmark results from Linux, Android, Windows, etc. platforms. Stay tuned and feel free to get in touch!

gbrand-salesforce commented 8 years ago

I wonder how it compares to ResNet50 (or even ResNet18, which would probably be closer to the accuracy of SqueezeNet)

psyhtest commented 8 years ago

@gbrand-salesforce That's the sort of questions we are aiming to answer with CK-Caffe. If you have a deploy.prototxt and a platform of interest, we can easily run the experiments there and share results to build a common knowledge. Ping me if you are interested.

dgschwend commented 7 years ago

@psyhtest Looks like a very interesting project!

Feel free to benchmark my ZynqNet CNN, too, if you're interested. I started with SqueezeNet and tried to build a very well-balanced CNN architecture, which fits well onto a custom-designed FPGA accelerator (you might be interested in this part as well...).

The project report and all code from my Master Thesis "ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network" are public. :wink:

michaelholm-ce commented 7 years ago

Hello all --- I'm interested in finding the fastest model architecture (i.e. lowest number of MACC's) with reasonable accuracy (speed is more important than accuracy for me). Based on comparisons posted here, it looks like Squeezenet1.1 is the best choice, but based on my reading, the darknet reference model (https://pjreddie.com/darknet/imagenet/#reference) or the so-called quicknet (https://arxiv.org/pdf/1701.02291.pdf) seem faster, but I have not been able to find any caffe implementations for these. Ideally, I would like to train using caffe in DIGITs, but do not have the experience to implement these in caffe from scratch.

Any thoughts or recommendations here?

mrgloom commented 7 years ago

In my tests using Caffe SqeezeNet v1.1 is slightly slower than AlexNet (I was using build-in tool for measure forward pass performance): https://github.com/mrgloom/kaggle-dogs-vs-cats-solution

Offtopic: regarding layer activation and weights visualization NVIDIA DIGITS can do this, but netscope have nicer visualization of networks.

anuragmundhada commented 6 years ago

The Googlenet page on all Netscope analyzers out there shows the wrong number for MACC (it's 10 times higher). Do you know why that is? @dgschwend https://dgschwend.github.io/netscope/#/preset/googlenet

dgschwend commented 6 years ago

@anuragmundhada, What‘s your golden reference regarding the MACCs? Let‘s open an issue on the http://github.com/dgschwend/netscope project for that discussion...

anuragmundhada commented 6 years ago

Taken the reference from the Inception-v2 paper: Rethinking the Inception Architecture for Computer Vision - https://arxiv.org/pdf/1512.00567.pdf Table 3 states that Cost is 1.5 Bn Ops - which should correspond to 1.5G macc, if I am not wrong.

Opening an issue on your repo. I mentioned this here only because you had put in a comment above, and seemed to have faced the same problem before correcting it.