Can't connect to external LLM servers

dnhkng / GlaDOS

This is the Personality Core for GLaDOS, the first steps towards a real-life implementation of the AI from the Portal series by Valve.

MIT License

2.92k stars 277 forks source link

Can't connect to external LLM servers #73

Open faneQ123 opened 3 months ago

faneQ123 commented 3 months ago

Hi, having this issue with connecting to external llms. Enviroment server for remote LLM:

Amd 79503xd
64 GB RAM
2x 7900xtx
Using LM-STUDIO fosr hosting LLM server Enviroment Client:
- AMD CPU
- 16 GB RAM
- Nvidia 3060 Laptop

Attached photos:

For LM-STUDIO and ollama serve

dnhkng commented 3 months ago

Try again with LM-Studio, and change your completion URL to: "http://{your-local-ip:port_for_lm-studio}/v1"

I see that LM-Studio has a normal endpoint: https://lmstudio.ai/docs/local-server

Ollama has a npn-standard POST endpoint... probably just to annoy me... https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion

dnhkng commented 3 months ago

Dammit, llama.cpp|server uses /completion, not /completions

Maybe try first firing up llama.cpp server on the server? It's the backend of both Ollama and LMStudio anyway.

faneQ123 commented 3 months ago

Try again with LM-Studio, and change your completion URL to: "http://{your-local-ip:port_for_lm-studio}/v1"

I see that LM-Studio has a normal endpoint: https://lmstudio.ai/docs/local-server

Ollama has a npn-standard POST endpoint... probably just to annoy me... https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion

Ok, i tried, same issue:

faneQ123 commented 3 months ago

Dammit, llama.cpp|server uses /completion, not /completions

Maybe try first firing up llama.cpp server on the server? It's the backend of both Ollama and LMStudio anyway.

LM-STUDIO server runs from llama.cpp and it's already started

EikaMikiku commented 2 months ago

There are a bunch of changes required to make it work. As is, unless I'm misunderstanding, it doesn't seem to follow OpenAI style completions API.

To make it more or less work, below changes are in glados.py only. 1) _process_line needs to look something like this

        if line["choices"][0]["finish_reason"] == "null":
            token = line["choices"][0]["text"]
            return token
        return None

2) process_LLM inside of if line:

                            line = line.decode("utf-8")
                            if line == "data: [DONE]":
                                break
                            else:
                                ... remaining logic that was under "if line:"

3) decode line from 2. was taken from _clean_raw_bytes, so needs removing from there to not do it twice.

dnhkng commented 2 months ago

@EikaMikiku Thanks for that.

As I'm now already supporting Win/Mac/Linux, and this is causing slowdowns, I'm not too keen on making a universal interface to every new LLM API.

That said, please make a PR, if your changes work, and don't cause problems!