Currently cuda seems to be taking ~9 GB GPU memory, (rendering is taking another ~4GB):
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:17:00.0 On | N/A |
| 18% 59C P5 46W / 250W | 5274MiB / 10988MiB | 31% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:65:00.0 Off | N/A |
| 30% 66C P2 81W / 250W | 8856MiB / 10989MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
This seems to suggest it is taking 3x the memory vs what we expect?
Did we miss anything in the calculation?
Thanks,
Le
-----some details-----
conv_layer_params = ((16, 3, 2), (32, 3, 2))
1st conv layer 404016, 2nd conv layer 202032, roughly adds up to 2x input layer,
then 2 for actor and critic networks, and 2 again for forward and backprop.
4x input layer for FrameStack
= 8 + 4 = 12x size of the input layer
4 bytes (float) 80 80 (image size) 3 (channels) 100 (unroll length) 12 (input + two conv layers + backprop + framestack) 30 (parallel envs) /1000,000,000 =2.7 GB
Currently cuda seems to be taking ~9 GB GPU memory, (rendering is taking another ~4GB): +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:17:00.0 On | N/A | | 18% 59C P5 46W / 250W | 5274MiB / 10988MiB | 31% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce RTX 208... Off | 00000000:65:00.0 Off | N/A | | 30% 66C P2 81W / 250W | 8856MiB / 10989MiB | 3% Default | +-------------------------------+----------------------+----------------------+
This seems to suggest it is taking 3x the memory vs what we expect?
Did we miss anything in the calculation?
Thanks, Le -----some details----- conv_layer_params = ((16, 3, 2), (32, 3, 2)) 1st conv layer 404016, 2nd conv layer 202032, roughly adds up to 2x input layer, then 2 for actor and critic networks, and 2 again for forward and backprop.