LikeLy-Journey / SegmenTron

Support PointRend, Fast_SCNN, HRNet, Deeplabv3_plus(xception, resnet, mobilenet), ContextNet, FPENet, DABNet, EdaNet, ENet, Espnetv2, RefineNet, UNet, DANet, HRNet, DFANet, HardNet, LedNet, OCNet, EncNet, DuNet, CGNet, CCNet, BiSeNet, PSPNet, ICNet, FCN, deeplab)
Apache License 2.0
705 stars 162 forks source link

Question about Training with Multiple GPUs #9

Closed Lubby-ch closed 4 years ago

Lubby-ch commented 4 years ago

In {$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \ $(dirname "$0")/train.py $CONFIG ${@:3} }, what does ${@:3} stand for? In training with multiple GPUs, I have encountered some problems that can not be solved. Could you give me a specific code instead of {$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \ $(dirname "$0")/train.py $CONFIG ${@:3} }?

LikeLy-Journey commented 4 years ago

you can try a command like this: CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 tools/train.py configs/cityscapes_deeplabv3_plus.yaml to train deeplab with 2 gpu. ${@:3} stand for custom arguments for train.py, you can specify print log every 20 iter: CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 tools/train.py configs/cityscapes_deeplabv3_plus.yaml --log-iter 20 same to: CUDA_VISIBLE_DEVICES=0,1 ./tools/dist_train.sh configs/cityscapes_deeplabv3_plus.yaml 2 --log-iter 20