OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.41k stars 303 forks source link

Mistral-Nemo not working #1793

Closed BBC-Esq closed 1 month ago

BBC-Esq commented 1 month ago

Mistral-nemo model is converting but not working. Likely related to the issue identified here:

https://github.com/OpenNMT/CTranslate2/issues/1743

    results_batch = generator.generate_batch(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: axis 2 has dimension 6144 but expected 7680
minhthuc2502 commented 1 month ago

It will be fixed in the next release

BBC-Esq commented 1 month ago

It will be fixed in the next release

Cool! BTW, I verified that thew new Mistral-Small model does in-fact work. Unfortunately, even running at int8 it spills over into system memory despite my 24GB.

https://huggingface.co/mistralai/Mistral-Small-Instruct-2409

Also, I'd love to try it using AWQ. Any chance the docs can give examples of how to use the new 4-bit mode of ctranslate2? Thanks. Feel free to close the issue whenever.

BBC-Esq commented 1 month ago

I confirmed that this issue is fixed when I converted the model with this change:

https://github.com/OpenNMT/CTranslate2/pull/1785/files

Closing for now unless you think others might want to reference this issue before the next release.