Multi-Node multi-process distributed training for Det3D

We try to do the Multi-Node multi-process distributed training for Det3D by using the following commands:

Node 1: (IP: 192.168.1.1, and has a free port: 1234) ::

python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE --nnodes=2 --node_rank=0 --master_addr="192.168.1.1" --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other arguments of your training script) Node 2: :: python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE --nnodes=2 --node_rank=1 --master_addr="192.168.1.1" --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other arguments of your training script)

Source: https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py

Is this method applicable for training nuscence dataset?

V2AI / Det3D

Multi-Node multi-process distributed training for Det3D #99