List of things to be (potentially) ported from Megatron-DeepSpeed

bigcode-project / Megatron-LM

Ongoing research training transformer models at scale

Other

373 stars 48 forks source link

List of things to be (potentially) ported from Megatron-DeepSpeed #6

Closed mayank31398 closed 1 year ago

mayank31398 commented 2 years ago

[x] AliBi embeddings
[ ] Checkpoint reshaping
[ ] Universal checkpoints

Feel free to add more things if required.

pacman100 commented 2 years ago

Hello, adding a few details as per the discussion on slack.

Checkpoint reshaping for different TP/PP sizes is already present in Megatron-LM: checkpoint_util.py.
For universal checkpoints, it would be a 2 step process:
1. Use the Megatron-LM tool checkpoint_util.py to reshape the checkpoint to TP=PP=1.
2. Use convert_megatron_gpt2_checkpoint.py to convert it to HF checkpoint

However, I think these need adaptation as BigCode is going to use Multi-Query Attention.

mintsugaEHEH commented 1 year ago

Hello. I try to reshape a fine-tuned checkpoint of starcoder(https://huggingface.co/bigcode/starcoder-megatron/tree/main) from TP=4,PP=4 to TP=8,PP=1 using tools/checkpoint_util.py, but I encountered a memory OOM issue. The machine I used has a memory of 512GB which is capable of loading the whole model. Any solution to solve this issue?

mayank31398 commented 1 year ago

closing since its too old