frgfm / torch-scan

Seamless analysis of your PyTorch models (RAM usage, FLOPs, MACs, receptive field, etc.)
https://frgfm.github.io/torch-scan/latest
Apache License 2.0
208 stars 22 forks source link

fix: Fixed GPU RAM estimation #64

Closed frgfm closed 2 years ago

frgfm commented 2 years ago

This PR fixes the GPU RAM estimation problem by:

What this PR will not solve:

Closes #63

cc @joonas-yoon

codecov[bot] commented 2 years ago

Codecov Report

Merging #64 (76aca8b) into main (f11e201) will decrease coverage by 1.42%. The diff coverage is 40.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #64      +/-   ##
==========================================
- Coverage   94.35%   92.93%   -1.43%     
==========================================
  Files          10       10              
  Lines         656      665       +9     
==========================================
- Hits          619      618       -1     
- Misses         37       47      +10     
Impacted Files Coverage Δ
torchscan/crawler.py 84.32% <ø> (ø)
torchscan/process/memory.py 39.13% <40.00%> (-32.30%) :arrow_down:
joonas-yoon commented 2 years ago

checkout to this branch fisrt, and install it in notebook as following command:

import sys
!{sys.executable} -m pip uninstall torchscan -y
!{sys.executable} -m pip install torchscan/.

got result:

Processing ./torchscan Installing build dependencies ... done Getting requirements to build wheel ... done Preparing wheel metadata ... done Requirement already satisfied: torch>=1.5.0 in /home/jupyter/.conda/envs/joonas/lib/python3.9/site-packages (from torchscan==0.1.2.dev0) (1.11.0) Requirement already satisfied: typing-extensions in /home/jupyter/.conda/envs/joonas/lib/python3.9/site-packages (from torch>=1.5.0->torchscan==0.1.2.dev0) (3.7.4.3) Building wheels for collected packages: torchscan Building wheel for torchscan (PEP 517) ... done Created wheel for torchscan: filename=torchscan-0.1.2.dev0-py3-none-any.whl size=30391 sha256=9fb4bc758c8f16683bdef0ec1cf9cd684a9a6d15d04eac11f02ab15cd39cb0da Stored in directory: /tmp/pip-ephem-wheel-cache-te9qtths/wheels/73/72/2c/7aef77450243410db62e4ec62b085f39cdaaf84259bda8aef1 Successfully built torchscan Installing collected packages: torchscan Successfully installed torchscan-0.1.2.dev0

But it still prints negative size:

Model size (params + buffers): 13.65 Mb
Framework & CUDA overhead: -24.21 Mb
Total RAM usage: -10.56 Mb
frgfm commented 2 years ago

checkout to this branch fisrt, and install it in notebook as following command:

import sys
!{sys.executable} -m pip uninstall torchscan -y
!{sys.executable} -m pip install torchscan/.

Thanks but are you positive this is the snippet you used to install it? If so, apart from checkout out, you need to install for the folder, which is called "torch-scan" not "torchscan". So I think it should be:

!{sys.executable} -m pip install -e torch-scan/.

Let me know if that fixes the problem :)

joonas-yoon commented 2 years ago

d'oh! I missed option -e, i will try again.

the reason for "torchscan" is, that is the name of directory I unzipped

thanks for letting me know :)

joonas-yoon commented 2 years ago

Script

netG = Generator().to(device)
summary(netG, (nz, 1, 1))
netD = Discriminator().to(device)
summary(netD, (3, 64, 64))

Output

______________________________________________________________
Layer        Type               Output Shape         Param #  
==============================================================
generator    Generator          (-1, 3, 64, 64)      0        
├─main       Sequential         (-1, 3, 64, 64)      0        
|    └─0     ConvTranspose2d    (-1, 512, 4, 4)      819,200  
|    └─1     BatchNorm2d        (-1, 512, 4, 4)      2,049    
|    └─2     ReLU               (-1, 512, 4, 4)      0        
|    └─3     ConvTranspose2d    (-1, 256, 8, 8)      2,097,152
|    └─4     BatchNorm2d        (-1, 256, 8, 8)      1,025    
|    └─5     ReLU               (-1, 256, 8, 8)      0        
|    └─6     ConvTranspose2d    (-1, 128, 16, 16)    524,288  
|    └─7     BatchNorm2d        (-1, 128, 16, 16)    513      
|    └─8     ReLU               (-1, 128, 16, 16)    0        
|    └─9     ConvTranspose2d    (-1, 64, 32, 32)     131,072  
|    └─10    BatchNorm2d        (-1, 64, 32, 32)     257      
|    └─11    ReLU               (-1, 64, 32, 32)     0        
|    └─12    ConvTranspose2d    (-1, 3, 64, 64)      3,072    
|    └─13    Tanh               (-1, 3, 64, 64)      0        
==============================================================
Trainable params: 3,576,704
Non-trainable params: 0
Total params: 3,576,704
--------------------------------------------------------------
Model size (params + buffers): 13.65 Mb
Framework & CUDA overhead: 1914.35 Mb
Total RAM usage: 1928.00 Mb
--------------------------------------------------------------
Floating Point Operations on forward: 857.74 MFLOPs
Multiply-Accumulations on forward: 428.96 MMACs
Direct memory accesses on forward: 432.46 MDMAs
______________________________________________________________

________________________________________________________________
Layer            Type             Output Shape         Param #  
================================================================
discriminator    Discriminator    (-1, 1, 1, 1)        0        
├─main           Sequential       (-1, 1, 1, 1)        0        
|    └─0         Conv2d           (-1, 64, 32, 32)     3,072    
|    └─1         LeakyReLU        (-1, 64, 32, 32)     0        
|    └─2         Conv2d           (-1, 128, 16, 16)    131,072  
|    └─3         BatchNorm2d      (-1, 128, 16, 16)    513      
|    └─4         LeakyReLU        (-1, 128, 16, 16)    0        
|    └─5         Conv2d           (-1, 256, 8, 8)      524,288  
|    └─6         BatchNorm2d      (-1, 256, 8, 8)      1,025    
|    └─7         LeakyReLU        (-1, 256, 8, 8)      0        
|    └─8         Conv2d           (-1, 512, 4, 4)      2,097,152
|    └─9         BatchNorm2d      (-1, 512, 4, 4)      2,049    
|    └─10        LeakyReLU        (-1, 512, 4, 4)      0        
|    └─11        Conv2d           (-1, 1, 1, 1)        8,192    
================================================================
Trainable params: 2,765,568
Non-trainable params: 0
Total params: 2,765,568
----------------------------------------------------------------
Model size (params + buffers): 10.56 Mb
Framework & CUDA overhead: 1923.74 Mb
Total RAM usage: 1934.30 Mb
----------------------------------------------------------------
Floating Point Operations on forward: 208.47 MFLOPs
Multiply-Accumulations on forward: 104.11 MMACs
Direct memory accesses on forward: 106.95 MDMAs
________________________________________________________________

Installed version with commit 76aca8b

There is no more negatives 👍

frgfm commented 2 years ago

Ah perfect :)