frgfm commented 2 years ago

This PR fixes the GPU RAM estimation problem by:

adding a second option to retrieve the GPU RAM information when parsing nvidia smi fails
adding a safeguard in the crawler

What this PR will not solve:

when several models are on the same GPUs, the RAM usage will be blended between the two. For now there is no viable solution to distinguish their RAM usage. It's thus recommended to run torchscan when no other object than your model is on GPU

Closes #63

cc @joonas-yoon

codecov[bot] commented 2 years ago

Codecov Report

Merging #64 (76aca8b) into main (f11e201) will decrease coverage by 1.42%. The diff coverage is 40.00%.

@@            Coverage Diff             @@
##             main      #64      +/-   ##
==========================================
- Coverage   94.35%   92.93%   -1.43%     
==========================================
  Files          10       10              
  Lines         656      665       +9     
==========================================
- Hits          619      618       -1     
- Misses         37       47      +10

Impacted Files	Coverage Δ
torchscan/crawler.py	`84.32% <ø> (ø)`
torchscan/process/memory.py	`39.13% <40.00%> (-32.30%)`	:arrow_down:

joonas-yoon commented 2 years ago

checkout to this branch fisrt, and install it in notebook as following command:

import sys
!{sys.executable} -m pip uninstall torchscan -y
!{sys.executable} -m pip install torchscan/.

got result:

Processing ./torchscan Installing build dependencies ... done Getting requirements to build wheel ... done Preparing wheel metadata ... done Requirement already satisfied: torch>=1.5.0 in /home/jupyter/.conda/envs/joonas/lib/python3.9/site-packages (from torchscan==0.1.2.dev0) (1.11.0) Requirement already satisfied: typing-extensions in /home/jupyter/.conda/envs/joonas/lib/python3.9/site-packages (from torch>=1.5.0->torchscan==0.1.2.dev0) (3.7.4.3) Building wheels for collected packages: torchscan Building wheel for torchscan (PEP 517) ... done Created wheel for torchscan: filename=torchscan-0.1.2.dev0-py3-none-any.whl size=30391 sha256=9fb4bc758c8f16683bdef0ec1cf9cd684a9a6d15d04eac11f02ab15cd39cb0da Stored in directory: /tmp/pip-ephem-wheel-cache-te9qtths/wheels/73/72/2c/7aef77450243410db62e4ec62b085f39cdaaf84259bda8aef1 Successfully built torchscan Installing collected packages: torchscan Successfully installed torchscan-0.1.2.dev0

But it still prints negative size:

Model size (params + buffers): 13.65 Mb
Framework & CUDA overhead: -24.21 Mb
Total RAM usage: -10.56 Mb

frgfm commented 2 years ago

checkout to this branch fisrt, and install it in notebook as following command:
import sys
!{sys.executable} -m pip uninstall torchscan -y
!{sys.executable} -m pip install torchscan/.

Thanks but are you positive this is the snippet you used to install it? If so, apart from checkout out, you need to install for the folder, which is called "torch-scan" not "torchscan". So I think it should be:

!{sys.executable} -m pip install -e torch-scan/.

Let me know if that fixes the problem :)

joonas-yoon commented 2 years ago

d'oh! I missed option -e, i will try again.

the reason for "torchscan" is, that is the name of directory I unzipped

thanks for letting me know :)

joonas-yoon commented 2 years ago

Script

netG = Generator().to(device)
summary(netG, (nz, 1, 1))
netD = Discriminator().to(device)
summary(netD, (3, 64, 64))

Output

______________________________________________________________
Layer        Type               Output Shape         Param #  
==============================================================
generator    Generator          (-1, 3, 64, 64)      0        
├─main       Sequential         (-1, 3, 64, 64)      0        
|    └─0     ConvTranspose2d    (-1, 512, 4, 4)      819,200  
|    └─1     BatchNorm2d        (-1, 512, 4, 4)      2,049    
|    └─2     ReLU               (-1, 512, 4, 4)      0        
|    └─3     ConvTranspose2d    (-1, 256, 8, 8)      2,097,152
|    └─4     BatchNorm2d        (-1, 256, 8, 8)      1,025    
|    └─5     ReLU               (-1, 256, 8, 8)      0        
|    └─6     ConvTranspose2d    (-1, 128, 16, 16)    524,288  
|    └─7     BatchNorm2d        (-1, 128, 16, 16)    513      
|    └─8     ReLU               (-1, 128, 16, 16)    0        
|    └─9     ConvTranspose2d    (-1, 64, 32, 32)     131,072  
|    └─10    BatchNorm2d        (-1, 64, 32, 32)     257      
|    └─11    ReLU               (-1, 64, 32, 32)     0        
|    └─12    ConvTranspose2d    (-1, 3, 64, 64)      3,072    
|    └─13    Tanh               (-1, 3, 64, 64)      0        
==============================================================
Trainable params: 3,576,704
Non-trainable params: 0
Total params: 3,576,704
--------------------------------------------------------------
Model size (params + buffers): 13.65 Mb
Framework & CUDA overhead: 1914.35 Mb
Total RAM usage: 1928.00 Mb
--------------------------------------------------------------
Floating Point Operations on forward: 857.74 MFLOPs
Multiply-Accumulations on forward: 428.96 MMACs
Direct memory accesses on forward: 432.46 MDMAs
______________________________________________________________

________________________________________________________________
Layer            Type             Output Shape         Param #  
================================================================
discriminator    Discriminator    (-1, 1, 1, 1)        0        
├─main           Sequential       (-1, 1, 1, 1)        0        
|    └─0         Conv2d           (-1, 64, 32, 32)     3,072    
|    └─1         LeakyReLU        (-1, 64, 32, 32)     0        
|    └─2         Conv2d           (-1, 128, 16, 16)    131,072  
|    └─3         BatchNorm2d      (-1, 128, 16, 16)    513      
|    └─4         LeakyReLU        (-1, 128, 16, 16)    0        
|    └─5         Conv2d           (-1, 256, 8, 8)      524,288  
|    └─6         BatchNorm2d      (-1, 256, 8, 8)      1,025    
|    └─7         LeakyReLU        (-1, 256, 8, 8)      0        
|    └─8         Conv2d           (-1, 512, 4, 4)      2,097,152
|    └─9         BatchNorm2d      (-1, 512, 4, 4)      2,049    
|    └─10        LeakyReLU        (-1, 512, 4, 4)      0        
|    └─11        Conv2d           (-1, 1, 1, 1)        8,192    
================================================================
Trainable params: 2,765,568
Non-trainable params: 0
Total params: 2,765,568
----------------------------------------------------------------
Model size (params + buffers): 10.56 Mb
Framework & CUDA overhead: 1923.74 Mb
Total RAM usage: 1934.30 Mb
----------------------------------------------------------------
Floating Point Operations on forward: 208.47 MFLOPs
Multiply-Accumulations on forward: 104.11 MMACs
Direct memory accesses on forward: 106.95 MDMAs
________________________________________________________________

Installed version with commit 76aca8b

There is no more negatives 👍

frgfm commented 2 years ago

Ah perfect :)

frgfm / torch-scan

fix: Fixed GPU RAM estimation #64

Codecov Report