How can I use one GPU to train the RepViT?

puyiwen commented 1 year ago

Hi, thank you for your great work! However, I only have one GPU, and I want to know how should I modify the training command line you provided? Do I need to modify the learning rate and batch_size? Thank you very much!

jameslahm commented 1 year ago

Hi, thanks for your interest! Single gpu training can be enabled by setting the nproc_per_node to 1. For example,

python -m torch.distributed.launch --nproc_per_node=1 --master_port 12346 --use_env main.py --model repvit_m1 --data-path ~/imagenet --dist-eval

And you can modify the batch_size according to your GPU memory. For example, you can specify the batch_size=128 by

python -m torch.distributed.launch --nproc_per_node=1 --master_port 12346 --use_env main.py --model repvit_m1 --data-path ~/imagenet --dist-eval --batch-size 128

The learning rate does not need to be modified, which will be linear scaled by https://github.com/THU-MIG/RepViT/blob/2d85a4c5b709a99fc81ffe4384333e244fda4ab5/main.py#L314-L315

puyiwen commented 1 year ago

Hi, thanks for your interest! Single gpu training can be enabled by setting the nproc_per_node to 1. For example,
python -m torch.distributed.launch --nproc_per_node=1 --master_port 12346 --use_env main.py --model repvit_m1 --data-path ~/imagenet --dist-eval
And you can modify the batch_size according to your GPU memory. For example, you can specify the batch_size=128 by
python -m torch.distributed.launch --nproc_per_node=1 --master_port 12346 --use_env main.py --model repvit_m1 --data-path ~/imagenet --dist-eval --batch-size 128
The learning rate does not need to be modified, which will be linear scaled by

https://github.com/THU-MIG/RepViT/blob/2d85a4c5b709a99fc81ffe4384333e244fda4ab5/main.py#L314-L315

Thank you for your reply! I think I have got it.

THU-MIG / RepViT

How can I use one GPU to train the RepViT? #6