AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.77k stars 7.96k forks source link

How to calculate the GPU memory needed? #4352

Open samux87 opened 4 years ago

samux87 commented 4 years ago

Hi guys, I have calculated the memory just for the image blob during the forward pass like this:

layer filters size/strd(dil) input output memory  
0 conv 32 3 x 3/ 1 1504 x1504 x 3 -> 1504 x1504 x 32 3.909 BF 72384512  
1 conv 64 3 x 3/ 2 1504 x1504 x 32 -> 752 x 752 x 64 20.847 BF 36192256  
2 conv 32 1 x 1/ 1 752 x 752 x 64 -> 752 x 752 x 32 2.316 BF 18096128  
3 conv 64 3 x 3/ 1 752 x 752 x 32 -> 752 x 752 x 64 20.847 BF 36192256  
4 Shortcut Layer: 1    
5 conv 128 3 x 3/ 2 752 x 752 x 64 -> 376 x 376 x 128 20.847 BF 18096128  
6 conv 64 1 x 1/ 1 376 x 376 x 128 -> 376 x 376 x 64 2.316 BF 9048064  
7 conv 128 3 x 3/ 1 376 x 376 x 64 -> 376 x 376 x 128 20.847 BF 18096128  
8 Shortcut Layer: 5    
9 conv 64 1 x 1/ 1 376 x 376 x 128 -> 376 x 376 x 64 2.316 BF 9048064  
10 conv 128 3 x 3/ 1 376 x 376 x 64 -> 376 x 376 x 128 20.847 BF 18096128  
11 Shortcut Layer: 8    
12 conv 256 3 x 3/ 2 376 x 376 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
13 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
14 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
15 Shortcut Layer: 12    
16 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
17 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
18 Shortcut Layer: 15    
19 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
20 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
21 Shortcut Layer: 18    
22 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
23 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
24 Shortcut Layer: 21    
25 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
26 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
27 Shortcut Layer: 24    
28 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
29 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
30 Shortcut Layer: 27    
31 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
32 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
33 Shortcut Layer: 30    
34 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
35 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
36 Shortcut Layer: 33    
37 conv 512 3 x 3/ 2 188 x 188 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
38 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
39 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
40 Shortcut Layer: 37    
41 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
42 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
43 Shortcut Layer: 40    
44 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
45 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
46 Shortcut Layer: 43    
47 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
48 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
49 Shortcut Layer: 46    
50 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
51 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
52 Shortcut Layer: 49    
53 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
54 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
55 Shortcut Layer: 52    
56 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
57 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
58 Shortcut Layer: 55    
59 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
60 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
61 Shortcut Layer: 58    
62 conv 1024 3 x 3/ 2 94 x 94 x 512 -> 47 x 47 x1024 20.847 BF 2262016  
63 conv 512 1 x 1/ 1 47 x 47 x1024 -> 47 x 47 x 512 2.316 BF 1131008  
64 conv 1024 3 x 3/ 1 47 x 47 x 512 -> 47 x 47 x1024 20.847 BF 2262016  
65 Shortcut Layer: 62    
66 conv 512 1 x 1/ 1 47 x 47 x1024 -> 47 x 47 x 512 2.316 BF 1131008  
67 conv 1024 3 x 3/ 1 47 x 47 x 512 -> 47 x 47 x1024 20.847 BF 2262016  
68 Shortcut Layer: 65    
69 conv 512 1 x 1/ 1 47 x 47 x1024 -> 47 x 47 x 512 2.316 BF 1131008  
70 conv 1024 3 x 3/ 1 47 x 47 x 512 -> 47 x 47 x1024 20.847 BF 2262016  
71 Shortcut Layer: 68    
72 conv 512 1 x 1/ 1 47 x 47 x1024 -> 47 x 47 x 512 2.316 BF 1131008  
73 conv 1024 3 x 3/ 1 47 x 47 x 512 -> 47 x 47 x1024 20.847 BF 2262016  
74 Shortcut Layer: 71    
75 conv 512 1 x 1/ 1 47 x 47 x1024 -> 47 x 47 x 512 2.316 BF 1131008  
76 conv 1024 3 x 3/ 1 47 x 47 x 512 -> 47 x 47 x1024 20.847 BF 2262016  
77 conv 512 1 x 1/ 1 47 x 47 x1024 -> 47 x 47 x 512 2.316 BF 1131008  
78 max 5x 5/ 1 47 x 47 x 512 -> 47 x 47 x 512 0.028 BF    
79 route 77 -> 47 x 47 x 512    
80 max 9x 9/ 1 47 x 47 x 512 -> 47 x 47 x 512 0.092 BF    
81 route 77 -> 47 x 47 x 512    
82 max 13x13/ 1 47 x 47 x 512 -> 47 x 47 x 512 0.191 BF    
83 route 82 80 78 77 -> 47 x 47 x2048    
84 conv 512 1 x 1/ 1 47 x 47 x2048 -> 47 x 47 x 512 4.633 BF 1131008  
85 conv 1024 3 x 3/ 1 47 x 47 x 512 -> 47 x 47 x1024 20.847 BF 2262016  
86 conv 512 1 x 1/ 1 47 x 47 x1024 -> 47 x 47 x 512 2.316 BF 1131008  
87 conv 1024 3 x 3/ 1 47 x 47 x 512 -> 47 x 47 x1024 20.847 BF 2262016  
88 conv 18 1 x 1/ 1 47 x 47 x1024 -> 47 x 47 x 18 0.081 BF 39762  
89 yolo    
[yolo] params: iou loss: mse, iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00    
90 route 86 -> 47 x 47 x 512    
91 conv 256 1 x 1/ 1 47 x 47 x 512 -> 47 x 47 x 256 0.579 BF 565504  
92 upsample 2x 47 x 47 x 256 -> 94 x 94 x 256    
93 route 92 61 -> 94 x 94 x 768    
94 conv 256 1 x 1/ 1 94 x 94 x 768 -> 94 x 94 x 256 3.474 BF 2262016  
95 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
96 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
97 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
98 conv 256 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 256 2.316 BF 2262016  
99 conv 512 3 x 3/ 1 94 x 94 x 256 -> 94 x 94 x 512 20.847 BF 4524032  
100 conv 18 1 x 1/ 1 94 x 94 x 512 -> 94 x 94 x 18 0.163 BF 159048  
101 yolo    
[yolo] params: iou loss: mse, iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00    
102 route 98 -> 94 x 94 x 256    
103 conv 128 1 x 1/ 1 94 x 94 x 256 -> 94 x 94 x 128 0.579 BF 1131008  
104 upsample 2x 94 x 94 x 128 -> 188 x 188 x 128    
105 route 104 36 -> 188 x 188 x 384    
106 conv 128 1 x 1/ 1 188 x 188 x 384 -> 188 x 188 x 128 3.474 BF 4524032  
107 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
108 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
109 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
110 conv 128 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 128 2.316 BF 4524032  
111 conv 256 3 x 3/ 1 188 x 188 x 128 -> 188 x 188 x 256 20.847 BF 9048064  
112 conv 18 1 x 1/ 1 188 x 188 x 256 -> 188 x 188 x 18 0.326 BF 636192  
113 yolo    
[yolo] params: iou loss: mse, iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00    
     
Total BFLOPS 858.348 502437050  
  *4 (float point, 4 bytes)  
  2009748200  
  1962644.727 KB
Forwarding: 1916.645241 MB
Fwd+Bwd: 3833.290482 MB

So, in the end, I got ~2GB per Image. batch number = 64

64 * 2 = 128 GB (just for forwarding; we should add parameters memory, backward memory, etc...)

This doesn't make sense to me because the GPU memory that I used was near 16 GB; Do you have any idea why the math doesn't match the reality?

Thank you, Sam.

AlexeyAB commented 4 years ago

mini_batch=batch/subdivisions

samux87 commented 4 years ago

mmmmh... in my case I have: batch: 64 subdivisions: 64

so, mini-batch: 1

Is This meaning that for each iteration is used just 1 image?

if yes, why I have "out of memory" error if I increase the input image size over 1504? (I used p3.2xlarge)

Dimensione istanza GPU – Tesla V100 Peer to peer GPU Memoria GPU (GB) vCPU Memoria (GB) Larghezza di banda di rete Larghezza di banda EBS Prezzo on demand/h* Istanza riservata effettiva di 1 anno all'ora* Istanza riservata effettiva di 3 anno all'ora*
p3.2xlarge 1 N/D 16 8 61 Fino a 10 Gb/s 1,5 Gbps 3,06 USD 1,99 USD 1,05 USD
AlexeyAB commented 4 years ago

Is This meaning that for each iteration is used just 1 image?

64 images per iteration

mini_batch=1 batch=64

forward-backward for 1 image weights update for 64 images