Closed snxraven closed 1 year ago
Added in 9a3960c9589ca254358e0f6a230e9c2571e0c5bc
Supports using a local model via abetlen/llama-cpp-python. Untested with other local backends, but if they have the same interface it should work.
The ggml 13B param model with 4-bit quantization performed pretty badly, often giving the same command multiple times. Something trained more specifically for conversational AI instead of just text completion might work better??
Wonderful! I will be playing with this.
Something trained more specifically for conversational AI instead of just text completion might work better?? I totally agree here, some thinking needs to be done here.
Thank you for the work!
With the advancements in at home AI it would be amazing to see support for the following backend:
https://abetlen.github.io/llama-cpp-python/
Web Server llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
To install the server package and get started:
pip install llama-cpp-python[server] export MODEL=./models/7B python3 -m llama_cpp.server Navigate to http://localhost:8000/docs to see the OpenAPI documentation.
Redirecting the API URL within the source of the golang library used for openAI gets about this far:
Then I am met with this error: