bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
371 stars 48 forks source link

Support interleaved pipeline schedules in checkpoint merging tools #45

Open RaymondLi0 opened 1 year ago

RaymondLi0 commented 1 year ago

https://github.com/bigcode-project/Megatron-LM/blob/multi-query-attention/tools/checkpoint_loader_megatron.py#L121