Only one gpu does work during distributed training

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

BSD 3-Clause "New" or "Revised" License

8.33k stars 1.39k forks source link

I have 4 GPUs and when I run the distributed training in my code following the code by referring to the Imagenet example,

my nvidia-smi looks like this

In the image it shows that gpu:1 to gpu:3 is working but it just does while the modeling is being load into the gpu. And when the actual backprop is being done, Not only are the process is run on gpu:0 but gpu:1 to gpu:3 does nothing(GPU utilization is 0).

Would there be any possible reason for this? Could this be an issue with the data loader? (I did not use the data_prefecther as in the Imagenet example.

NVIDIA / apex

Only one gpu does work during distributed training #787