Inference using llama.cpp

allenai / OLMo

Modeling, training, eval, and inference code for OLMo

https://allenai.org/olmo

Apache License 2.0

4.48k stars 449 forks source link

Inference using llama.cpp #571

Open nopperl opened 5 months ago

nopperl commented 5 months ago

📚 The doc issue

Since llama.cpp now supports the OLMo architecture, it might make sense to mention this inference and quantization option in the readme. It's especially useful for CPUs and Metal. I would also recommend linking to the GGUF version of the instruct model.

Suggest a potential alternative/fix

No response