allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.24k stars 399 forks source link

Allow hybrid sharding to have multiple replicas in a node #582

Closed 2015aroras closed 2 months ago

2015aroras commented 2 months ago

This PR changes our hybrid sharding to allow multiple replicas in a node. One benefit of this is that we can use hybrid sharding to do no sharding by having the number of replicas equal with the world size. Then we don't need to use DDP or NO_SHARD (which is getting deprecated).