Multi-node training - Githubissues

Hi there, Thank you so much for this release! When trying to run multi-node training, I can see that this repo is equipped to do this, when I see the following lines: https://github.com/TRI-ML/dd3d/blob/da25b614a29344830c96c2848c02a15b35380c4b/tridet/utils/setup.py#L57 https://github.com/TRI-ML/dd3d/blob/da25b614a29344830c96c2848c02a15b35380c4b/Makefile#L42

Have you trained using multiple nodes (not just multiple GPUs) where you have to provide 2 different ip addresses from within the docker containers you provided in this repo? And has this worked for you? When I execute training on two different machines, the code hangs and I dont see any terminal printouts...

Thank you in advance!

TRI-ML / dd3d

Multi-node training #40