How to train swin model on signle machine with 4 GPUs?

xiaolinyezi commented 2 years ago

When I set the --gpus 4 and --gpu-ids [0,1,2,3], I meet the following error. File "/home/ahs/anaconda3/envs/swintrans-xiaolin/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 55, in train_step ('MMDataParallel only supports single GPU training, if you need to' AssertionError: MMDataParallel only supports single GPU training, if you need to train with multiple GPUs, please use MMDistributedDataParallelinstead.
When I set the --gpus 1 and --gpu-ids [0], it works well...

How to fix this Error? Thanks a lot.

impiga commented 2 years ago

Hi, could you post the command you used?

To train the model on single machine with 4 gpus, you may run a command like this:

tools/dist_train.sh <CONFIG_FILE> 4 --options model.pretrained=<PRETRAIN_MODEL> [other options]

xiaolinyezi commented 2 years ago

python -m torch.distributed.launch --nproc_per_node=4 tools/train.py --config configs/swin/upernet_swin_small_patch4_window7_512x512_160k_pascalVoc.py --launcher pytorch It seem works.

SwinTransformer / Swin-Transformer-Semantic-Segmentation

How to train swin model on signle machine with 4 GPUs? #55