NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.33k stars 1.39k forks source link

Only one gpu does work during distributed training #787

Open ggaemo opened 4 years ago

ggaemo commented 4 years ago

I have 4 GPUs and when I run the distributed training in my code following the code by referring to the Imagenet example,

my nvidia-smi looks like this

image

In the image it shows that gpu:1 to gpu:3 is working but it just does while the modeling is being load into the gpu. And when the actual backprop is being done, Not only are the process is run on gpu:0 but gpu:1 to gpu:3 does nothing(GPU utilization is 0).

Would there be any possible reason for this? Could this be an issue with the data loader? (I did not use the data_prefecther as in the Imagenet example.

ggaemo commented 4 years ago

I think I might have found the culprit. When sending the model to gpu, I used

model = model.to(torch.device('cuda:0')

However, in the code it used

model = model.cuda()

As far as I know, the recommended way of sending the model to a gpu in pytorch is the former, isn't it?