Closed Mooler0410 closed 1 week ago
I think the model weights are released here: https://huggingface.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c
I think the model weights are released here: https://huggingface.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c
Thanks! I've already found it. While when this question is posted, the weights haven't been set as public.
Now, I'm looking for the tokenizerš¤£. To run the example, a tokenizer is required. But I cannot find any. Any idea about this?
I think the tokenizer path should point to the .model file in the huggingface repos. For example, I downloaded the mamba2-hybrid-8b-3t-4k
repo from huggingface, and mamba2-hybrid-8b-3t-4k/mt_nlg_plus_multilingual_ja_zh_the_stack_frac_015_256k.model
is the tokenizer. I'm running inference using run_text_gen_server_8b.sh
and the checkpoint/tokenizer paths are
CHECKPOINT_PATH="/workspace/checkpoints/mamba2-hybrid-8b-3t-4k/"
TOKENIZER_PATH="/workspace/checkpoints/mamba2-hybrid-8b-3t-4k/mt_nlg_plus_multilingual_ja_zh_the_stack_frac_015_256k.model"
respectively.
Wow, thank you so much for your guidance! It took me hours to find something like a tokenizer.
Never used megatron beforeš. You did save my life!!
Your question An Empirical Study of Mamba-based Language Models](https://github.com/NVIDIA/Megatron-LM/tree/ssm/examples/mamba) Hi! I'm impressed by this work and cannot wait to try the new mamba-2-hybrid. This paper mentioned that the weights are released on Huggingface. But I cannot find any. Wondering have they been released? If yes, where can I download them?
Thanks a lot for your folks' contribution to the community!