FlagOpen / FlagScale

FlagScale is a large model toolkit based on open-sourced projects.
Other
167 stars 42 forks source link

[Hetero] Improve the general heterogeneous training parallelism #180

Closed Corle-hyz closed 3 months ago

Corle-hyz commented 3 months ago

This PR enables heterogeneous training based on process meshes. Different tensor parallel sizes, pipeline parallel splits, and multi-meshes are allowed.

Usage python run.py --config-path ./examples/aquila/conf --config-name config_hetero