Closed shantanudev closed 7 years ago
I haven't got a multi-GPU node to test this on, but have you set the -nGPU
flag correctly like below?
th Train.lua -nGPU 8
@SeanNaren Yes, I have done this. Basically it limits me to a batch size of about 30 even though I have 8 GPUs.
I just ran this on our internal AWS K80 server and it worked fine:
It was already running something however all GPUs were used when I used th Train.lua -nGPU
. Are you using the latest branch?
@SeanNaren Hmm, let me do some investigation on my end. I will let you know.
Hi Sean,
I was wondering if you faced an issue where not all the GPUs are being utilized as evident in the output below. Also, it will not allow me to enter a larger batch even though I have more GPUs.
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 6318 C /home/ec2-user/src/torch/install/bin/luajit 10757MiB | | 1 6318 C /home/ec2-user/src/torch/install/bin/luajit 149MiB | | 2 6318 C /home/ec2-user/src/torch/install/bin/luajit 149MiB | | 3 6318 C /home/ec2-user/src/torch/install/bin/luajit 149MiB | | 4 6318 C /home/ec2-user/src/torch/install/bin/luajit 149MiB | | 5 6318 C /home/ec2-user/src/torch/install/bin/luajit 149MiB | | 6 6318 C /home/ec2-user/src/torch/install/bin/luajit 149MiB | | 7 6318 C /home/ec2-user/src/torch/install/bin/luajit 149MiB | +-----------------------------------------------------------------------------+