eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.64k stars 197 forks source link

Add a normal llama.cpp server endpoint option. #338

Open lastrosade opened 6 months ago

lastrosade commented 6 months ago

By adding a llama.cpp server endpoint option, we could easily just use features already present in llama.cpp without having to rely on llama-cpp-python.

The llama.cpp server supports both HIP and Vulkan on Windows.

lastrosade commented 6 months ago

Note that the llama.cpp server endpoint is openai compatible, it would probably be sufficient to reuse the openai endpoint code without any model/API key requirements. Maybe a way to specify samplers like min_p, top_k and temp. tho, this would make it impossible to specify a prompt template and would use chatml by default.