RCGAI / SimplyRetrieve

Lightweight chat AI platform featuring custom knowledge, open-source LLMs, prompt-engineering, retrieval analysis. Highly customizable. For Retrieval-Centric & Retrieval-Augmented Generation.
MIT License
198 stars 14 forks source link

Request: local text-generation-webui support #12

Open JeremyBickel opened 10 months ago

JeremyBickel commented 10 months ago

Will you add support for oobabooga's text-generation-webui? An llm initialization for post requests and a few patterns might be sufficient. I've been trying to do it, but I've had to try to figure out what's being sent from the llms/initialize.py, and langchain and - was textgen involved? I have a hackneyed version now, but it doesn't support streaming and just isn't proper.

RCGAI commented 10 months ago

Hi @JeremyBickel , I think it is possible to add feature support if it is not too complicated to implement. Exactly what features do you like to have? what do you mean by An llm initialization for post requests? something like a dropdown menu for quickly switching between different LLM models?

JeremyBickel commented 10 months ago

@RCGAI Here's what I came up with, which replaces llms/initialize.py. It demonstrates what I meant by 'patterns' in the first 4 functions defs. It works for single message completions, but not for chat; I couldn't get the history to work right. But the main thing is that it's geared toward POST requests, which will enable me to use the backend that I like. https://github.com/oobabooga/text-generation-webui exposes an OpenAI-like API to serve local models, and it can be accessed with simple POST requests.

import requests import json

def completion(prompts): data = {} uri = 'http://127.0.0.1:5000/v1/completions' data["prompt"] = prompts data["max_tokens"] = 2000 data["temperature"] = 0 data["top_p"] = 0.95 data["seed"] = 142857 return data, uri

def chat_completion(): data = {} uri = 'http://127.0.0.1:5000/v1/chat/completions' data["messages"] = [ { "role": "user", "content": "Hello!" } ] data["instruction_template"] = "Alpaca" data["mode"] = "instruct" return data, uri

def chat_completion_with_characters(): data = {} uri = 'http://127.0.0.1:5000/v1/chat/completions' data["messages"] = [ { "role": "user", "content": "How are you?" } ] data["mode"] = "chat", data["character"] = "Example" return data, uri

def chat_completion_with_streaming(): data = {} uri = 'http://127.0.0.1:5000/v1/chat/completions' data["messages"] = [ { "role": "user", "content": prompt } ] data["mode"] = "instruct", data["instruction_template"] = "Alpaca" data["stream"] = True return data, uri

def initialize_llm(model_args={}, pipeline_args={}):

url = "http://127.0.0.1:5000/v1/completions"

headers = {
    "Content-Type": "application/json"
}
#history = []
#data = {}

def generate_fn(prompts, **kwargs):
    while True:
        data, uri = completion(prompts)

        print(f"SENDING...")
        response = requests.post(uri, headers=headers, json=data, verify=False)
        #print(f"RESPONSE {response.json()}")
        #print(f"RESPONSE TYPE {type(response)}")
        res_text = response.json()['choices'][0]['text']
        print(f"RESPONSE TEXT: {res_text}")
        return res_text

main_agent_llm = generate_fn
return main_agent_llm, None
RCGAI commented 10 months ago

@JeremyBickel thanks for the detailed descriptions. Access to an OpenAI-like API seems to be a good feature and updating llms/initialize.py is indeed a right direction I think, so your codes look great. I believe below are features needed to properly implement it. Please let me know if my understanding is correct, and anything else needs to be added.

JeremyBickel commented 10 months ago

Chat history and streaming seem like priorities, but there are other OpenAI-compatible endpoints supported by the text-generation-webui backend, which is hard to write all the time, so I just call it ooba, which is a truncation of the guy's handle. It has extensions with multimodal support, too, which might be useful, but I don't want to pull you away from whatever direction you have to go to accomplish your vision for your project. However, if you're interested, here's a link for information about the API. It has training, too.

If all that's too different from where you're headed here, then simple POST requests support would do fine. I just have a hard time keeping track of precisely what format the output of the HuggingFace pipeline is or what the streaming object is supposed to be. But perhaps those are trivial matters to you, since you've spent time with this code already.

But if you don't mind the suggestion, then I highly recommend a little exploration of ooba, because, for one thing, it has a pretty lively group of active developers, and their changes seem to be consistently productive, but that's just my couple pennnies' worth, and I leave it to you.

Thank you for this project. I appreciate the simplicity.

RCGAI commented 10 months ago

Thanks for the comments and suggestions. I will take a look into ooba's repository and the API. Supporting POST requests with chat history and streaming seem productive and not difficult to implement so I will work on it. For other more sophisticated features, I will take a look at ooba's and figure out if the direction and simplicity are consistent with this repository. I will keep you posted. For the next 2 weeks, I will have my new year break so let me work on the features addition around mid January. Thanks!