fe1ixxu / ALMA

State-of-the-art LLM-based translation models.
MIT License
439 stars 35 forks source link

Questions about Inference #47

Closed kira-lin closed 5 months ago

kira-lin commented 5 months ago

I just tried out ALMA-7B-R on my rtx4070s and it's great! However, I wonder if it's possible to speed up inference further. Namely, do we have any quantized versions? Can I use llama.cpp to run this?

Thanks

fe1ixxu commented 5 months ago

Thanks for your interest! There are some un-official release at huggingface: https://huggingface.co/RichardErkhov/haoranxu_-_ALMA-13B-R-gguf

Please enjoy them :)