TRI-ML / dd3d

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.
MIT License
464 stars 74 forks source link

Multi-node training #40

Open EphChem opened 1 year ago

EphChem commented 1 year ago

Hi there, Thank you so much for this release! When trying to run multi-node training, I can see that this repo is equipped to do this, when I see the following lines: https://github.com/TRI-ML/dd3d/blob/da25b614a29344830c96c2848c02a15b35380c4b/tridet/utils/setup.py#L57 https://github.com/TRI-ML/dd3d/blob/da25b614a29344830c96c2848c02a15b35380c4b/Makefile#L42

Have you trained using multiple nodes (not just multiple GPUs) where you have to provide 2 different ip addresses from within the docker containers you provided in this repo? And has this worked for you? When I execute training on two different machines, the code hangs and I dont see any terminal printouts...

Thank you in advance!