Closed xiaolinyezi closed 2 years ago
Hi, could you post the command you used?
To train the model on single machine with 4 gpus, you may run a command like this:
tools/dist_train.sh <CONFIG_FILE> 4 --options model.pretrained=<PRETRAIN_MODEL> [other options]
python -m torch.distributed.launch --nproc_per_node=4 tools/train.py --config configs/swin/upernet_swin_small_patch4_window7_512x512_160k_pascalVoc.py --launcher pytorch It seem works.
When I set the --gpus 4 and --gpu-ids [0,1,2,3], I meet the following error. File "/home/ahs/anaconda3/envs/swintrans-xiaolin/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 55, in train_step ('MMDataParallel only supports single GPU training, if you need to' AssertionError: MMDataParallel only supports single GPU training, if you need to train with multiple GPUs, please use MMDistributedDataParallelinstead.
When I set the --gpus 1 and --gpu-ids [0], it works well...
How to fix this Error? Thanks a lot.