Closed Corle-hyz closed 3 months ago
This PR enables heterogeneous training based on process meshes. Different tensor parallel sizes, pipeline parallel splits, and multi-meshes are allowed.
Usage python run.py --config-path ./examples/aquila/conf --config-name config_hetero
python run.py --config-path ./examples/aquila/conf --config-name config_hetero
This PR enables heterogeneous training based on process meshes. Different tensor parallel sizes, pipeline parallel splits, and multi-meshes are allowed.
Usage
python run.py --config-path ./examples/aquila/conf --config-name config_hetero