Open ggaemo opened 4 years ago
I think I might have found the culprit. When sending the model to gpu, I used
model = model.to(torch.device('cuda:0')
However, in the code it used
model = model.cuda()
As far as I know, the recommended way of sending the model to a gpu in pytorch is the former, isn't it?
I have 4 GPUs and when I run the distributed training in my code following the code by referring to the Imagenet example,
my nvidia-smi looks like this
In the image it shows that gpu:1 to gpu:3 is working but it just does while the modeling is being load into the gpu. And when the actual backprop is being done, Not only are the process is run on gpu:0 but gpu:1 to gpu:3 does nothing(GPU utilization is 0).
Would there be any possible reason for this? Could this be an issue with the data loader? (I did not use the data_prefecther as in the Imagenet example.