Closed Lubby-ch closed 4 years ago
you can try a command like this:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 tools/train.py configs/cityscapes_deeplabv3_plus.yaml
to train deeplab with 2 gpu.
${@:3} stand for custom arguments for train.py, you can specify print log every 20 iter:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 tools/train.py configs/cityscapes_deeplabv3_plus.yaml --log-iter 20
same to:
CUDA_VISIBLE_DEVICES=0,1 ./tools/dist_train.sh configs/cityscapes_deeplabv3_plus.yaml 2 --log-iter 20
In {$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \ $(dirname "$0")/train.py $CONFIG ${@:3} }, what does ${@:3} stand for? In training with multiple GPUs, I have encountered some problems that can not be solved. Could you give me a specific code instead of {$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \ $(dirname "$0")/train.py $CONFIG ${@:3} }?