Open lastrosade opened 6 months ago
Note that the llama.cpp server endpoint is openai compatible, it would probably be sufficient to reuse the openai endpoint code without any model/API key requirements. Maybe a way to specify samplers like min_p, top_k and temp. tho, this would make it impossible to specify a prompt template and would use chatml by default.
By adding a llama.cpp server endpoint option, we could easily just use features already present in llama.cpp without having to rely on llama-cpp-python.
The llama.cpp server supports both HIP and Vulkan on Windows.