A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
It does not properly process the head_dim parameter which results in weight shape mismatch.
Supporting the above also requires upgrading the transformers package to the latest version, which brings additional issues like ALBERT_PRETRAINED_MODEL_ARCHIVE_LIST not defined in transformers which was referenced in nemo/collections/nlp/modules/common/huggingface/huggingface_utils.py.
The tokenizer conversion doesn't seem to be properly handled. It looks for tokenizer.model which does not exist for this new model which uses tekken and the tokenizer file is a json.
Describe the solution you'd like
Upgrade transformers package to latest
Update the mistral conversion script to support the latest mistral nemo model
Is your feature request related to a problem? Please describe.
Currently the only provided conversion script is https://github.com/NVIDIA/NeMo/blob/main/scripts/checkpoint_converters/convert_mistral_7b_hf_to_nemo.py. This script doesn't support converting the latest https://huggingface.co/mistralai/Mistral-Nemo-Base-2407 model - I encountered the following issues:
head_dim
parameter which results in weight shape mismatch.ALBERT_PRETRAINED_MODEL_ARCHIVE_LIST
not defined intransformers
which was referenced in nemo/collections/nlp/modules/common/huggingface/huggingface_utils.py.tokenizer.model
which does not exist for this new model which uses tekken and the tokenizer file is a json.Describe the solution you'd like