forresti / SqueezeNet

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
BSD 2-Clause "Simplified" License
2.17k stars 723 forks source link

Comparing GPU memory usage to BVLC reference CaffeNet #7

Closed gavinmh closed 8 years ago

gavinmh commented 8 years ago

Thanks for sharing this work. I am comparing the GPU memory utilization of the BVLC CaffeNet and SqueezeNet. The GPU Memory usage is not what I expect on Ubuntu 14.04 with a Titan X.

Idle:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     337MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   129MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
+-----------------------------------------------------------------------------+

After loading a caffe.Classifier with SqueezeNet's weights and deploy.prototxt with PyCaffe in a Jupyter notebook:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     337MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   131MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
|    0     13713    C   /usr/bin/python                                229MiB |
+-----------------------------------------------------------------------------+

While classiyfing with SqueezeNet: (t = timeit.Timer('net.predict([image], oversample=True).flatten().argsort()[:5]', 'from main import net, image') t.timeit(100):)

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     337MiB |
|    0      2631    G   compiz                                         106MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   137MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
|    0     13713    C   /usr/bin/python                                543MiB |
+-----------------------------------------------------------------------------+

BVLC CaffeNet Comparison

Idle:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     337MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   133MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
+-----------------------------------------------------------------------------+

After creating a CaffeNet caffe.Classifier:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     338MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   139MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
|    0     14231    C   /usr/bin/python                                184MiB |
+-----------------------------------------------------------------------------+

While classiyfing with CaffeNet: (t = timeit.Timer('net.predict([image], oversample=True).flatten().argsort()[:5]', 'from __main__ import net, image') t.timeit(100):)

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1504    G   /usr/bin/X                                     338MiB |
|    0      2631    G   compiz                                         113MiB |
|    0      3502    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   139MiB |
|    0     10627    G   /usr/bin/nvidia-settings                        22MiB |
|    0     14231    C   /usr/bin/python                                465MiB |
+-----------------------------------------------------------------------------+

SqueezeNet appears to use more GPU memory than the reference BVLC CaffeNet. Am I missing something?

forresti commented 8 years ago

That's possible. SqueezeNet has 50x fewer weights than AlexNet, but the activations are not particularly small.

We have just posted a pre-release version of SqueezeNet_v1.1: https://github.com/DeepScale/SqueezeNet/tree/squeezenet_v1.1_preRelease

Compared to SqueezeNet v1.0, here is what v1.1 provides:

  1. smaller activations (i.e. less memory utilization)
  2. less computation per image (2.2x less than v1.0)
  3. equivalent accuracy

We haven't put up a deploy.prototxt of SqueezeNet v1.1 yet, but see this post for how to create your own: https://github.com/DeepScale/SqueezeNet/issues/1

gavinmh commented 8 years ago

Thanks!