Open JeremyBickel opened 11 months ago
Hi @JeremyBickel , I think it is possible to add feature support if it is not too complicated to implement. Exactly what features do you like to have? what do you mean by An llm initialization for post requests
? something like a dropdown menu for quickly switching between different LLM models?
@RCGAI Here's what I came up with, which replaces llms/initialize.py. It demonstrates what I meant by 'patterns' in the first 4 functions defs. It works for single message completions, but not for chat; I couldn't get the history to work right. But the main thing is that it's geared toward POST requests, which will enable me to use the backend that I like. https://github.com/oobabooga/text-generation-webui exposes an OpenAI-like API to serve local models, and it can be accessed with simple POST requests.
import requests import json
def completion(prompts): data = {} uri = 'http://127.0.0.1:5000/v1/completions' data["prompt"] = prompts data["max_tokens"] = 2000 data["temperature"] = 0 data["top_p"] = 0.95 data["seed"] = 142857 return data, uri
def chat_completion(): data = {} uri = 'http://127.0.0.1:5000/v1/chat/completions' data["messages"] = [ { "role": "user", "content": "Hello!" } ] data["instruction_template"] = "Alpaca" data["mode"] = "instruct" return data, uri
def chat_completion_with_characters(): data = {} uri = 'http://127.0.0.1:5000/v1/chat/completions' data["messages"] = [ { "role": "user", "content": "How are you?" } ] data["mode"] = "chat", data["character"] = "Example" return data, uri
def chat_completion_with_streaming(): data = {} uri = 'http://127.0.0.1:5000/v1/chat/completions' data["messages"] = [ { "role": "user", "content": prompt } ] data["mode"] = "instruct", data["instruction_template"] = "Alpaca" data["stream"] = True return data, uri
def initialize_llm(model_args={}, pipeline_args={}):
headers = {
"Content-Type": "application/json"
}
#history = []
#data = {}
def generate_fn(prompts, **kwargs):
while True:
data, uri = completion(prompts)
print(f"SENDING...")
response = requests.post(uri, headers=headers, json=data, verify=False)
#print(f"RESPONSE {response.json()}")
#print(f"RESPONSE TYPE {type(response)}")
res_text = response.json()['choices'][0]['text']
print(f"RESPONSE TEXT: {res_text}")
return res_text
main_agent_llm = generate_fn
return main_agent_llm, None
@JeremyBickel thanks for the detailed descriptions. Access to an OpenAI-like API seems to be a good feature and updating llms/initialize.py
is indeed a right direction I think, so your codes look great. I believe below are features needed to properly implement it. Please let me know if my understanding is correct, and anything else needs to be added.
llms/initialize.py
(which is already drafted in your codes above)Chat history and streaming seem like priorities, but there are other OpenAI-compatible endpoints supported by the text-generation-webui backend, which is hard to write all the time, so I just call it ooba, which is a truncation of the guy's handle. It has extensions with multimodal support, too, which might be useful, but I don't want to pull you away from whatever direction you have to go to accomplish your vision for your project. However, if you're interested, here's a link for information about the API. It has training, too.
If all that's too different from where you're headed here, then simple POST requests support would do fine. I just have a hard time keeping track of precisely what format the output of the HuggingFace pipeline is or what the streaming object is supposed to be. But perhaps those are trivial matters to you, since you've spent time with this code already.
But if you don't mind the suggestion, then I highly recommend a little exploration of ooba, because, for one thing, it has a pretty lively group of active developers, and their changes seem to be consistently productive, but that's just my couple pennnies' worth, and I leave it to you.
Thank you for this project. I appreciate the simplicity.
Thanks for the comments and suggestions. I will take a look into ooba's repository and the API. Supporting POST requests with chat history and streaming seem productive and not difficult to implement so I will work on it. For other more sophisticated features, I will take a look at ooba's and figure out if the direction and simplicity are consistent with this repository. I will keep you posted. For the next 2 weeks, I will have my new year break so let me work on the features addition around mid January. Thanks!
Will you add support for oobabooga's text-generation-webui? An llm initialization for post requests and a few patterns might be sufficient. I've been trying to do it, but I've had to try to figure out what's being sent from the llms/initialize.py, and langchain and - was textgen involved? I have a hackneyed version now, but it doesn't support streaming and just isn't proper.