[QUESTION]Mamba-2-hybrid Weights

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

Other

9.23k stars 2.08k forks source link

[QUESTION]Mamba-2-hybrid Weights #864

Closed Mooler0410 closed 1 week ago

Mooler0410 commented 2 weeks ago

Your question An Empirical Study of Mamba-based Language Models](https://github.com/NVIDIA/Megatron-LM/tree/ssm/examples/mamba) Hi! I'm impressed by this work and cannot wait to try the new mamba-2-hybrid. This paper mentioned that the weights are released on Huggingface. But I cannot find any. Wondering have they been released? If yes, where can I download them?

Thanks a lot for your folks' contribution to the community!

ruipeterpan commented 1 week ago

I think the model weights are released here: https://huggingface.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

Mooler0410 commented 1 week ago

I think the model weights are released here: https://huggingface.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

Thanks! I've already found it. While when this question is posted, the weights haven't been set as public.

Now, I'm looking for the tokenizer🤣. To run the example, a tokenizer is required. But I cannot find any. Any idea about this?

ruipeterpan commented 1 week ago

I think the tokenizer path should point to the .model file in the huggingface repos. For example, I downloaded the mamba2-hybrid-8b-3t-4k repo from huggingface, and mamba2-hybrid-8b-3t-4k/mt_nlg_plus_multilingual_ja_zh_the_stack_frac_015_256k.model is the tokenizer. I'm running inference using run_text_gen_server_8b.sh and the checkpoint/tokenizer paths are

CHECKPOINT_PATH="/workspace/checkpoints/mamba2-hybrid-8b-3t-4k/"
TOKENIZER_PATH="/workspace/checkpoints/mamba2-hybrid-8b-3t-4k/mt_nlg_plus_multilingual_ja_zh_the_stack_frac_015_256k.model"

respectively.

Mooler0410 commented 1 week ago

Wow, thank you so much for your guidance! It took me hours to find something like a tokenizer.

Never used megatron before🙃. You did save my life!!