Guitaricet / relora

Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
https://arxiv.org/abs/2307.05695
Apache License 2.0
436 stars 39 forks source link

RFC - multinode training #15

Closed omri123 closed 6 months ago

omri123 commented 9 months ago

Hi I would like to implement multi-node training. Will you accept this kind of contribution? What are the known gaps toward multinode training? Thanks Omri

Guitaricet commented 6 months ago

Multinode training is already implemented. Just use torchrun to start torchrun_main.py