Open StrongChris opened 11 months ago
Can you try torchrun -m yourproject.train [ARGS]
?
I think it should work out of the box if not I'll have a look. You won't be able to use some of the Dora features like inserting the base config from an existing run with -f [SIG]
. If all your machines are on slurm then it will definitely be easier to use the grid system. If training on a single machine juste run dora run -d
I've checked this actually works, even on multi machines, see: https://github.com/facebookresearch/dora/blob/main/README.md#multi-node-training-without-slurm
https://github.com/facebookresearch/dora/blob/main/README.md#multi-node-training-without-slurm
hi, how does the [ARGS] here mean?
❓ Questions
Torchrun is the standard recommended way to run multi gpu, multi machine training. How can one launch projects that are written to use dora using torchrun?