Closed mayank31398 closed 1 year ago
Hello, adding a few details as per the discussion on slack.
Checkpoint reshaping for different TP/PP sizes is already present in Megatron-LM: checkpoint_util.py.
For universal checkpoints, it would be a 2 step process:
However, I think these need adaptation as BigCode is going to use Multi-Query Attention.
Hello. I try to reshape a fine-tuned checkpoint of starcoder(https://huggingface.co/bigcode/starcoder-megatron/tree/main) from TP=4,PP=4 to TP=8,PP=1 using tools/checkpoint_util.py, but I encountered a memory OOM issue. The machine I used has a memory of 512GB which is capable of loading the whole model. Any solution to solve this issue?
closing since its too old
Feel free to add more things if required.