allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.24k stars 399 forks source link

Delay device mesh import #561

Closed 2015aroras closed 2 months ago

2015aroras commented 2 months ago

DeviceMesh is introduced in torch 2.2, and so trying to import it in torch 2.1 is caused the training code to break. This PR delays its import until it is needed (for hybrid sharding).

Fixes #559