TRI-ML / dd3d

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.
MIT License
451 stars 74 forks source link

Multi-GPU training inside docker #6

Open williamhyin opened 2 years ago

williamhyin commented 2 years ago

HI ,

Thanks for your code release. I have a question about Multi-GPU training command. Is it possible to train with Multi-GPU(8) inside docker?

Like:python -m torch.distributed.launch --nproc_per_node 8 train.py xxx

Multi-GPU training outside docker by using the following command is not so comfortable for server training :

make docker-run-mpi COMMAND="".

I am looking forward to your Reply. And thanks again for your great job!

dennis-park-TRI commented 2 years ago

Thanks for the interest @williamhyin. By default, we only support the multi-gpu training via make docker-run-mpi ... . It should be possible to modify train.py to make with work with the pytorch launcher. We will have a look at this, if there are a number of use cases.

HeroyiuWFY commented 2 years ago

HI, I also want to know how to train with Multi-GPUs by using python -m torch.distributed.launch --nproc_per_node 8 train.py xxx Looking forward to your Reply and thanks again for your great job!

revisitq commented 2 years ago

Hi, guys! It's easily to training with multi-gpu without docker. After install all the requirements, just run the command CUDA_VISIBLE_DEVICES="x,x,x,x" mpirun -np ${num_gpus} ./script/train.py +experiments=dd3d_kitti_dla34.yaml will start training with multi-gpu.

rockywind commented 2 years ago

Hi, @revisitq I met the error

mpirun was unable to launch the specified application as it could not access
or execute an executable:
Executable: ./script/train.py
Node: shaxbw06
while attempting to start process rank 0.

The command line is

CUDA_VISIBLE_DEVICES=5,7 mpirun -np 2 ./script/train.py +experiments=dd3d_kitti_dla34.yaml 
revisitq commented 2 years ago

Hi, @revisitq I met the error

mpirun was unable to launch the specified application as it could not access
or execute an executable:
Executable: ./script/train.py
Node: shaxbw06
while attempting to start process rank 0.

The command line is

CUDA_VISIBLE_DEVICES=5,7 mpirun -np 2 ./script/train.py +experiments=dd3d_kitti_dla34.yaml 
  1. Make sure you install the dependence follow dockerfile
  2. Check your command, it should be CUDA_VISIBLE_DEVICES="5,7" mpirun -np 2 ./script/train.py +experiments=dd3d_kitti_dla34.yaml
azuryl commented 2 years ago

@williamhyin you can build conda env by youself and run by mpirun -n 8 python scripts/train.py +experiments=dd3d_kitti_dla34