Closed williamjames1 closed 8 years ago
When you say "performance," do you mean "speed," "accuracy," or something else?
yes basically I want to understand time required for processing one image using original Alexnet and after using SqueezeNet at more or less same accuracy (using GPU-cudnn).
@forresti that's an extremely useful benchmark, when we are talking about embedding CNNs into small devices it is not only about shrinking the model size but aggressively reducing the number of computations per frame.
I really think Squeezenet is on the way, but there are no numbers about it...
You might be interested in my CNN analysis tool at http://dgschwend.github.io/netscope. The web-based tool allows you to calculate the total number of operations, weights, and activation memory needed for each layer in a given caffe network. Squeezenet v1.0, SqueezeNet v1.1 and AlexNet are included as presets.
Processing time should be more or less proportional to the number of Multiply-Accumulate Operations. In embedded systems, the intermediate memory needed for the activations/feature maps is probably also relevant. Squeezenet v1.1 is definitely an improvement there. Here's a summary:
CNN | #MACC Operations | #Weights | #Activations |
---|---|---|---|
AlexNet | 1140M | 62.37M | 2.39M |
Squeezenet v1.1 | 388M | 1.23M | 7.84M |
Squeezenet v1.0 | 861M | 1.24M | 12.73M |
Inception v3 | 3230M | 23.83M | 18.51M |
GoogleNet | 1600M | 6.99M | 10.37M |
VGG-16 | 16360M | 169.8M | 30.06M |
Edit: added some other well-known CNNs for comparison. all input crops = 227x227x3 Edit2: Fixed MACCs for VGG-16
@dgschwend Very nice! I have been using Netscope.
If I remember correctly, GoogLeNet-v1 has ~10x fewer MACCs than VGG-19. Could I be wrong about that?
@forresti You're right, I somehow missed one digit... fixed!
@dgschwend That is a must have tool, I was exactly thinking about an oscilloscope for CNNs! Do you think your tool could generate visual representation about layers too?
On Darknet framework there is a feature called visualize that generates visual representations of the filters, layer by layer, take a look:
It would be useful to have this kind of visual representation rendered on the menu that is shown when you pass the mouse over the network layer.
@Grabber This gets a little bit off-topic, maybe we can move the discussion to my netscope repository? https://github.com/dgschwend/netscope/issues/1
@dgschwend BTW, I noticed you've run Netscope on Inception-v3. Do you have Caffe config files for Inception-v3? (And, better yet... a working training protocol for Inception-v3?)
@forresti You can view and edit the ".prototxt" content by clicking on the "(edit)" link near the network title (http://dgschwend.github.io/netscope/#/preset/inceptionv3)
The original model is from https://github.com/smichalowski/google_inception_v3_for_caffe, but I never tried training it.
@dgschwend Got it! Thanks a lot!
@williamjames1 @Grabber @dgschwend @forresti
By the way, we have recently released CK-Caffe, a framework for collaborative performance analysis and optimisation of Caffe across multiple platforms, libraries, models, etc.
For example, this Jupyter notebook compares the best performance per image across 4 CNNs and 4 BLAS libraries on a Samsung Chromebook 2 platform. When using OpenBLAS, SqueezeNet 1.1 is 2 times faster than SqueezeNet 1.0 and 2.4 times faster than AlexNet, broadly in line with expectations set by the SqueezeNet paper.
We also have comparisons for other platforms, models and optimisations. (We are discussing with our customer what and when we can release in addition to the core CK-Caffe framework.)
In addition, we are working on an engine for crowdsourcing benchmark results from Linux, Android, Windows, etc. platforms. Stay tuned and feel free to get in touch!
I wonder how it compares to ResNet50 (or even ResNet18, which would probably be closer to the accuracy of SqueezeNet)
@gbrand-salesforce
That's the sort of questions we are aiming to answer with CK-Caffe. If you have a deploy.prototxt
and a platform of interest, we can easily run the experiments there and share results to build a common knowledge. Ping me if you are interested.
@psyhtest Looks like a very interesting project!
Feel free to benchmark my ZynqNet CNN, too, if you're interested. I started with SqueezeNet and tried to build a very well-balanced CNN architecture, which fits well onto a custom-designed FPGA accelerator (you might be interested in this part as well...).
The project report and all code from my Master Thesis "ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network" are public. :wink:
Hello all --- I'm interested in finding the fastest model architecture (i.e. lowest number of MACC's) with reasonable accuracy (speed is more important than accuracy for me). Based on comparisons posted here, it looks like Squeezenet1.1 is the best choice, but based on my reading, the darknet reference model (https://pjreddie.com/darknet/imagenet/#reference) or the so-called quicknet (https://arxiv.org/pdf/1701.02291.pdf) seem faster, but I have not been able to find any caffe implementations for these. Ideally, I would like to train using caffe in DIGITs, but do not have the experience to implement these in caffe from scratch.
Any thoughts or recommendations here?
In my tests using Caffe SqeezeNet v1.1 is slightly slower than AlexNet (I was using build-in tool for measure forward pass performance): https://github.com/mrgloom/kaggle-dogs-vs-cats-solution
Offtopic: regarding layer activation and weights visualization NVIDIA DIGITS can do this, but netscope have nicer visualization of networks.
The Googlenet page on all Netscope analyzers out there shows the wrong number for MACC (it's 10 times higher). Do you know why that is? @dgschwend https://dgschwend.github.io/netscope/#/preset/googlenet
@anuragmundhada, What‘s your golden reference regarding the MACCs? Let‘s open an issue on the http://github.com/dgschwend/netscope project for that discussion...
Taken the reference from the Inception-v2 paper: Rethinking the Inception Architecture for Computer Vision - https://arxiv.org/pdf/1512.00567.pdf Table 3 states that Cost is 1.5 Bn Ops - which should correspond to 1.5G macc, if I am not wrong.
Opening an issue on your repo. I mentioned this here only because you had put in a comment above, and seemed to have faced the same problem before correcting it.
Hi, I want to know the performance improvement for SqueezeNet with reference to AlexNet. Any idea?
Thanks. William. J.