if i don't have a slurm cluster,how to fine-tune models based on NeMo-aligner with multi-nodes, can u show a demo?
can i use torchrun to start distributed task with NeMo-Aligner?
I'm not familiar with non-SLURM setups, but we do the distributed initialization using PyTorch Lightning, so hopefully what works with PTL should work here as well.
if i don't have a slurm cluster,how to fine-tune models based on NeMo-aligner with multi-nodes, can u show a demo? can i use torchrun to start distributed task with NeMo-Aligner?