Open yumianhuli2 opened 2 years ago
This repo uses pytorch-lightning as the trainer. It's convenient to do single-gpu or multi-gpu training by simply setting the gpu number:
python train.py --root /path/to/dataset_root/ --embed_dim 128 --gpus #YOUR_GPU_NUM
If I remember correctly, by default this will use Pytorch DDP Spawn strategy for multi-gpu training. If you want to use Pytorch DDP instead (which should be faster than DDP Spawn in general), you can add one line to train.py:
parser.add_argument('--strategy', type=str, default='ddp')
Let me know if it works.
@yumianhuli2 To reproduce the results in the paper when using multi-gpu training, please also make sure that the effective batch size (batch_size * gpu_num) is 32. For example, if you use 4 gpus, then the batch size per gpu should be 8:
python train.py --root /path/to/dataset_root/ --embed_dim 128 --gpus 4 --train_batch_size 8
Thank you!
Thank you for your outstanding work!If the batchsize is changed, does the learning rate need to be adjusted accordingly?
@tandangzuoren I believe the learning rate should be adjusted. The number of epochs may also need to be changed.
@ZikangZhou Thank you for your advice on this. May I know why 32 is the effective batch size?
Hello! How to use GPU and multi-card for training? The default card 0 is the CPU for training. Thank U!