Use GPU and multi-card for model training

ZikangZhou / HiVT

[CVPR 2022] HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction

https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_HiVT_Hierarchical_Vector_Transformer_for_Multi-Agent_Motion_Prediction_CVPR_2022_paper.pdf

Apache License 2.0

577 stars 115 forks source link

Use GPU and multi-card for model training #3

Open yumianhuli2 opened 2 years ago

yumianhuli2 commented 2 years ago

Hello! How to use GPU and multi-card for training? The default card 0 is the CPU for training. Thank U!

ZikangZhou commented 2 years ago

This repo uses pytorch-lightning as the trainer. It's convenient to do single-gpu or multi-gpu training by simply setting the gpu number:

python train.py --root /path/to/dataset_root/ --embed_dim 128 --gpus #YOUR_GPU_NUM

If I remember correctly, by default this will use Pytorch DDP Spawn strategy for multi-gpu training. If you want to use Pytorch DDP instead (which should be faster than DDP Spawn in general), you can add one line to train.py:

parser.add_argument('--strategy', type=str, default='ddp')

Let me know if it works.

ZikangZhou commented 2 years ago

@yumianhuli2 To reproduce the results in the paper when using multi-gpu training, please also make sure that the effective batch size (batch_size * gpu_num) is 32. For example, if you use 4 gpus, then the batch size per gpu should be 8:

python train.py --root /path/to/dataset_root/ --embed_dim 128 --gpus 4 --train_batch_size 8

yumianhuli2 commented 2 years ago

Thank you！

tandangzuoren commented 1 year ago

Thank you for your outstanding work！If the batchsize is changed, does the learning rate need to be adjusted accordingly？

ZikangZhou commented 1 year ago

@tandangzuoren I believe the learning rate should be adjusted. The number of epochs may also need to be changed.

tteokl commented 1 year ago

@ZikangZhou Thank you for your advice on this. May I know why 32 is the effective batch size?