We try to do the Multi-Node multi-process distributed training for Det3D by using the following commands:
Node 1: (IP: 192.168.1.1, and has a free port: 1234)
::
python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
--nnodes=2 --node_rank=0 --master_addr="192.168.1.1"
--master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
and all other arguments of your training script)
Node 2:
::
python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
--nnodes=2 --node_rank=1 --master_addr="192.168.1.1"
--master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
and all other arguments of your training script)
We try to do the Multi-Node multi-process distributed training for Det3D by using the following commands:
Node 1: (IP: 192.168.1.1, and has a free port: 1234) ::
Source: https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py
Is this method applicable for training nuscence dataset?