Feature Request: Support for llama backend

snxraven commented 1 year ago

With the advancements in at home AI it would be amazing to see support for the following backend:

https://abetlen.github.io/llama-cpp-python/

Web Server llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).

To install the server package and get started:

pip install llama-cpp-python[server] export MODEL=./models/7B python3 -m llama_cpp.server Navigate to http://localhost:8000/docs to see the OpenAPI documentation.

Redirecting the API URL within the source of the golang library used for openAI gets about this far:

Then I am met with this error:

19c31e44 iteration 1: executing                                                                                                                                                                                                                                                                
panic: runtime error: slice bounds out of range [4:3]                                                                                                       

goroutine 70 [running]:
aquarium/actor.(*Actor).iteration(0xc00018b100)
    /home/raven/aquarium/actor/actor.go:327 +0x1079
aquarium/actor.(*Actor).Loop.func2()
    /home/raven/aquarium/actor/actor.go:148 +0x68
created by aquarium/actor.(*Actor).Loop
    /home/raven/aquarium/actor/actor.go:141 +0xabb

fafrd commented 1 year ago

Added in 9a3960c9589ca254358e0f6a230e9c2571e0c5bc

Supports using a local model via abetlen/llama-cpp-python. Untested with other local backends, but if they have the same interface it should work.

The ggml 13B param model with 4-bit quantization performed pretty badly, often giving the same command multiple times. Something trained more specifically for conversational AI instead of just text completion might work better??

snxraven commented 1 year ago

Wonderful! I will be playing with this.

Something trained more specifically for conversational AI instead of just text completion might work better?? I totally agree here, some thinking needs to be done here.

Thank you for the work!

fafrd / aquarium

Feature Request: Support for llama backend #7