V2AI / Det3D

World's first general purpose 3D object detection codebse.
https://arxiv.org/abs/1908.09492
Apache License 2.0
1.5k stars 298 forks source link

Multi-Node multi-process distributed training for Det3D #99

Closed jinglin80 closed 4 years ago

jinglin80 commented 4 years ago

We try to do the Multi-Node multi-process distributed training for Det3D by using the following commands:

Node 1: (IP: 192.168.1.1, and has a free port: 1234) ::

python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE --nnodes=2 --node_rank=0 --master_addr="192.168.1.1" --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other arguments of your training script) Node 2: :: python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE --nnodes=2 --node_rank=1 --master_addr="192.168.1.1" --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other arguments of your training script)

Source: https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py

Is this method applicable for training nuscence dataset?

poodarchu commented 4 years ago

multi node has not been tested, but I think you can try it by yourself.