NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.42k stars 2.12k forks source link

Suggestion for merge (and split into) model parallel partitions #232

Open bzantium opened 2 years ago

bzantium commented 2 years ago

Currently, the script of tools/merge_mp_partitions.py only provides merging tensor model parallelism and splitting into given pipeline model parallelism which is quite constrained for use. I suggest to enlarge the script for merging both tensor and pipeline parallelism and also provide a script for splitting checkpoint into partitions separately.

github-actions[bot] commented 1 year ago

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.