Closed steven4547466 closed 1 year ago
@steven4547466 can you add a note about possibly needed to compile with CuBLAS support to the directions for GPU layers.
@steven4547466 can you add a note about possibly needed to compile with CuBLAS support to the directions for GPU layers.
Sure
This is a giant PR, but it cleans up the base project a lot, and implements local AI alongside openAI. This is great, and, imo, should be considered "base"
The main point of these commits is to add the
BoogaAIController
and theProtocolAIController
. Part of this change comes with a small refactor, like moving the system message to a text file so that you can give different controllers different system messages, or switch between system messages for testing easily.The booga controller is for the oobabooga text generation webui. It interfaces with the api that it has.
The protocol controller is a generic controller used to allow developers to make their own text generation server that will work as long as it follows the protocol.
The other parts of this commit are mainly cleanup. Like commenting out the ZeroMQ youtube chat manager as it is no longer used, and removing the related packages, as they bloat the workspace and were giving me errors. Don't worry about the "Fix a couple infinite loops" and its revert, that was a mistake on my part
Directions for oobabooga:
Booga Manager
into theWhole Thing manager
'sAIController
field:Booga Manager
underManagers
and in the properties tab, seeModel
. You should set it to the name of a model in yourmodels/
tab, for example:airoboros-l2-13b-2.2.Q5_K_M.gguf
Threads
at 0, which is automatic b.N Batch
can be left at 512 for most systems, but may need to be lowered on low end systems c.N GPU Layers
should be set to the amount of layers you want to offload to your GPU, this massively decreases generation time, but you will need a compatible, and strong, GPU and llamacpp must be compiled with cuBLAS (https://github.com/ggerganov/llama.cpp#cublas) d. IncreasingN Context
allows the api to understand more, but as long as the system message stays moderately low in token count, there's not really much point to increasing it. This may need to be decreased on low end systems.Directions for protocol (MOST PEOPLE WILL NEVER USE THIS, BUT THE OPTION IS THERE, REQUIRES DEVELOPMENT EXPERIENCE): I'm not going to be explaining how to make the server, but I am going to explain the protocol you must follow.
This project will send an http POST request along the specified port (default 9998) with the following JSON structure:
There might be more entries, just know that
AuthorRole=0
is a system message, andAuthorRole=1
is a user message.it is your job to respond to this request with a simple string. No JSON, no XML, just a simple string, which is the output, and a 200 status code.
With this anyone can also make their own controller by extending the
AIController
class, which allows for more possibilities in the future.