NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.3k stars 2.1k forks source link

When can we have a the MOE checkpoint convert script. #790

Open shamanez opened 2 months ago

shamanez commented 2 months ago

As mentioned here, having a proper MOE/Mixtral checkpoint converter script will help us to fine-tune Mixtral.

yqli2420 commented 2 months ago

+1

hwdef commented 1 month ago

I also strongly need this tool

https://github.com/NVIDIA/Megatron-LM/issues/756#issuecomment-2126186633

oecompmind commented 3 weeks ago

Here is some information about converting Huggingface checkpoints to Nemo. It seems there is a conversion script available on GitHub. Although I haven't confirmed it, it might be useful. https://medium.com/karakuri/train-moes-on-aws-trainium-a0ebb599fbda and https://github.com/abeja-inc/Megatron-LM