how to fine-tune models with multi-nodes

NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment

Apache License 2.0

419 stars 45 forks source link

how to fine-tune models with multi-nodes #185

Closed panjianfei closed 1 month ago

panjianfei commented 1 month ago

if i don't have a slurm cluster，how to fine-tune models based on NeMo-aligner with multi-nodes, can u show a demo? can i use torchrun to start distributed task with NeMo-Aligner?

odelalleau commented 1 month ago

I'm not familiar with non-SLURM setups, but we do the distributed initialization using PyTorch Lightning, so hopefully what works with PTL should work here as well.

panjianfei commented 1 month ago

get it, thank you👍👍👍