Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
26.09k stars 2.61k forks source link

Support Koboldcpp #513

Closed Botoni closed 6 months ago

Botoni commented 10 months ago

Hi, could be possible to support koboldcpp? It is faster and loads more models than lm studio and has a better compatibility with linux.

In fact, it connects using the lm studio option and writing the koboldcpp address, but responses on the chat get truncated at the first or second word, in the koboldcpp terminal the response is fully generated.

Botoni commented 9 months ago

If it helps the implementation, using the lmstudio config, the koboldcpp terminal gives this response when sending a prompt:

Processing Prompt [BLAS] (509 / 509 tokens) Generating (10 / 100 tokens) (EOS token triggered!) ContextLimit: 519/2048, Processing:11.37s (22.3ms/T), Generation:1.19s (118.9ms/T), Total:12.56s (0.80T/s) Output: Hello! How can I help you today?

intulint commented 7 months ago

I couldn’t connect to kobaldcpp, I thought support for such a convenient backend would be built in right away How can I connect?

zacanbot commented 6 months ago

Koboldcpp says its API is OpenAI compatible. But if I configure LocalAI or LM Studio endpoints to point to Koboldcpp, I get the same truncation experience as the OP. Maybe it is a configuration issue in Koboldcpp?

One motivation I can add for Koboldcpp support, other than it being a really convenient and configurable LLM engine, is that it is the only way to get hardware acceleration for older AMD cards that are not officially supported by ROCm (I have an RX 6600).

kenhuang commented 6 months ago

https://petstore.swagger.io/?url=https://lite.koboldai.net/kobold_api.json

koboldcpp api ref

timothycarambat commented 6 months ago

If its OpenAI Compatible, cant the Generic OpenAI connector (last LLM connector) work here?

zacanbot commented 6 months ago

Thank you for adding the Koboldcpp connection options. However, can we re-open the issue? The original truncation issue still persists with the latest version of AnythingLLM using the new Koboldcpp connector:

image

While in the Koboldcpp server logs, I see that the whole message is generated:

...
Input: {"model": "koboldcpp/Meta-Llama-3-8B-Instruct-Q5_K_M", "stream": true, "messages": [{"role": "system", "content": "Given the following conversation, relevant context, and a follow up question, reply with an answer to the current question the user is asking. Return only your response to the question given the above information following the users instructions as needed."}, {"role": "user", "content": "hello"}], "temperature": 0.7}

Processing Prompt [BLAS] (61 / 61 tokens)
Generating (100 / 100 tokens)
CtxLimit: 161/8192, Process:0.01s (0.2ms/T = 4066.67T/s), Generate:2.88s (28.8ms/T = 34.76T/s), Total:2.89s (34.58T/s)
Output: Hello! How can I assist you today? What's on your mind?
...
timothycarambat commented 6 months ago

@shatfield4

shatfield4 commented 6 months ago

@zacanbot Can you give me any more information on how to replicate this bug? I have downloaded the same Llama3 model you are using and the streaming is working fine for me and showing the entire message inside AnythingLLM. Are you running the latest version of KoboldCPP? Did you change any config settings inside KoboldCPP?

zacanbot commented 6 months ago

I just updated to the latest version (1.64) and it seems to be working correctly now! Thanks for digging into this. Appreciated 👍

intulint commented 5 months ago

image image For some reason it’s not working for me again, I just downloaded a new version of AnythingLLM. Kobaldcpp Version 1.65 Doesn't let me select a model, there's an empty window. Apparently something has disappeared again.

timothycarambat commented 5 months ago

Then this likely is because whatever you have put in as the baseURL is not correct. Does http://localhost:5001/v1/models even return data in the browser?

cc @shatfield4

intulint commented 5 months ago

image image

Yes, the browser opens the link http://localhost:5001/v1/models I also pull out the value in Python using the API. Should the base url be "http://localhost:5001/v1"? This path is written in the tooltip.

Negatrev commented 5 months ago

image image

Yes, the browser opens the link http://localhost:5001/v1/models I also pull out the value in Python using the API. Should the base url be "http://localhost:5001/v1"? This path is written in the tooltip.

Exact same issue for me. Koboldcpp has a few different api options and none of them are loading with AnythingLLM. But other clients, including koboldai, koboldlite, SillyTavern can all use it without issue.

attyru commented 3 months ago

Exact same issue for me. Koboldcpp has a few different api options and none of them are loading with AnythingLLM. But other clients, including koboldai, koboldlite, SillyTavern can all use it without issue.

У меня получилось обойти эту проблему так: вместо http://localhost:5001/v1 поставил http://127.0.0.1/v1 и все заработало