Since llama.cpp now supports the OLMo architecture, it might make sense to mention this inference and quantization option in the readme. It's especially useful for CPUs and Metal. I would also recommend linking to the GGUF version of the instruct model.
📚 The doc issue
Since llama.cpp now supports the OLMo architecture, it might make sense to mention this inference and quantization option in the readme. It's especially useful for CPUs and Metal. I would also recommend linking to the GGUF version of the instruct model.
Suggest a potential alternative/fix
No response