Oobabooga support for local generations?

DefamationStation / Retrochat-v2

RetroChat is a powerful command-line interface for interacting with various AI language models. It provides a seamless experience for engaging with different chat providers while offering robust features for managing and customizing your conversations. The code in this repo is 100% AI generated. Nothing has been written by a human.

66 stars 5 forks source link

Oobabooga support for local generations? #2

Open hexive opened 3 months ago

hexive commented 3 months ago

Any chance you'd consider adding Oobabooga support as a provider? https://github.com/oobabooga/text-generation-webui

They have an API that is "drop in compatible" with OpenAI so maybe it wouldn't be too much work?

Retrochat is a super fun tool. Thanks for putting it out there. I'm using it mostly with ollama, but there are some models that run better and faster locally with ooba.

DefamationStation commented 3 months ago

Can you please test if it works because I don't have that provider

Make sure you use a few features like /load and @ to test RAG and if the markdown syntax appears properly and let me know so i can push it to the main script!

https://github.com/DefamationStation/Retrochat-v2/blob/main/retrochat_oogabooga.py

it looks at http://127.0.0.1:5000

hexive commented 3 months ago

ooh this is awesome! thank you!

on a quick test right now, I was able to connect through the API on localhost:5000 just fine and it looks like output is being generated by ooba, but nothing is being displayed on the retrochat side.

I can dig deeper into the oobaboogaChatSession class tonight and see if I can figure out what's going on with that.

hexive commented 3 months ago

I'm not a programmer, but adding these lines to the ChatSession class got the output to print in Retrochat

async for line in response.content:
    if line:
        try:
            decoded_line = line.decode('utf-8').strip()
            if decoded_line.startswith('data: '):
                json_str = decoded_line[6:]  # Remove 'data: ' prefix
                if json_str != '[DONE]':
                    response_json = json.loads(json_str)
                    message_content = response_json['choices'][0]['delta'].get('content', '')
                    if message_content:
                        complete_message += message_content
                        yield message_content  # Yield each chunk for streaming
        except json.JSONDecodeError:
            continue

That's from Claude, haha. Does that make any sense to you?

Although, for a reason I don't understand this prints asynchronously and then prints a duplicate of the text again at the end.

DefamationStation commented 3 months ago

Im trying to implement it properly and I am running the Text Generation WebUI but I'm not sure how to make calls to it or even run it as a server, could you please provide some additional information so i can get it up and running and test it? I can open the WebUI and run models just fine on it.

hexive commented 3 months ago

oh sure, you probably just need to start the program with the --api flag. that will open the default socket on :5000 and that's how I was able to successfully send and receive with Retrochat in my tests.

so however you are launching the WebUI just add --api at the end.

edit to add: make sure you have a model loaded in WebUI or you'll get some strange errors when you start the chat in Retrochat.