The issue of training occupying two GPUs

TaatiTeam / MotionAGFormer

Official implementation of the paper "MotionAGFormer: Enhancing 3D Pose Estimation with a Transformer-GCNFormer Network" (WACV 2024).

Apache License 2.0

128 stars 15 forks source link

The issue of training occupying two GPUs #10

Closed yaoyao674 closed 8 months ago

yaoyao674 commented 8 months ago

Hello author, your work is very good.

What I would like to ask is to execute Python train.py -- config configs/h36m/MotionAGFormer xsmall. yaml. I have observed that it will be trained on two GPUs. Is this normal? I did not find any location in the code where the number of GPUs is set. Can you point it out? Thank you

AsukaCamellia commented 8 months ago

device = 'cuda' if torch.cuda.is_available() else 'cpu'
   model = load_model(args)
   if torch.cuda.is_available():
      model = torch.nn.DataParallel(model)
   model.to(device)

Lines 256--260 in the file train.py You could set the parameters of DP as follow:

   if torch.cuda.is_available():
      model = torch.nn.DataParallel(model,device_ids=[0,1])

You could set the device_ids = [0,1,2,......] or whatever in your GPUs list

SoroushMehraban commented 8 months ago

Thanks @AsukaCamellia for answering it. I close the issue.

yaoyao674 commented 8 months ago

thanks your help @AsukaCamellia @SoroushMehraban

zerowing-ex commented 8 months ago

Before you execute the train.py, you can specify the GPUs which you want to use, i.e. export CUDA_VISIBLE_DEVICES=0,1 which will run the program on your first and second gpu in the machine. The other way to do the same thing:

CUDA_VISIBLE_DEVICES=0,1 Python train.py -- config configs/h36m/MotionAGFormer-xsmall. yaml

I suggest you shoud use the torch.nn.parallel.DistributedDataParallel other than torch.nn.DataParallel if you need to train the model on multiple GPUs.