Its architecture should be almost identical to Mistral 7B without sliding-window-attention, except that it uses the Tekken tokenizer rather than sentencepiece.
Author
No
Security
[X] I confirm that the model is safe to run which does not contain any malicious code or content.
Integrity
[X] I confirm that the model comes from unique and original work and does not contain any plagiarism.
Model introduction
The model is a new 12B foundation model trained by Mistral and NVIDIA with high performance on common non-coding benchmarks.
Model URL
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
Additional instructions (Optional)
Its architecture should be almost identical to Mistral 7B without sliding-window-attention, except that it uses the Tekken tokenizer rather than sentencepiece.
Author
No
Security
Integrity