Add section to docs on how to use own LLMs - link in settings goes to 404

ewebgh33 commented 8 months ago

Can a section please be added to the docs on using our own local LLMs? Has no one asked this yet? It's not even in the FAQ let alone "set up" or "running" etc.

Do I need a standalone install of Llama-cpp to connect to? As it doesn't seem to connect to the URL created by running textgen-webui (Oobabooga). Also, what is the difference between "custom LLM" and "self hosted". Aren't these the same thing? If you're running llama-cpp that's "custom" too?

Anyway When you select "Custom LLM" in the options, it gives a description that says "LLM on a custom endpoint. See docs for examples." But there are no examples in the docs. There is a green link icon next to the description, it goes to a 404 not found. Looks like it's supposed to go to https://cheshirecat.ai/2023/08/19/custom-large-language-model/ But that doesn't exist as I get the 404. Is it supposed to be going to this link? https://cheshirecat.ai/custom-large-language-model/

Is that link up to date? I didn't think I needed to code a custom REST API to connect to all the LLMs I am already using via textgen-webUI.

In the instructions to set this up, where do we install all this? As we're running in docker. You're saying to set up a custom REST API in some other venv of our choosing?

Please, note that the Cheshire Cat is running inside a Docker container. Thus, it has it’s own network bridge called docker0. Once you start the Cat’s container, your host machine (i.e. your computer) is assigned an IP address under the Docker network. Therefore, you should set the url parameter accordingly.

This is just confusing, and the problem with having something "extendable" in a docker. Extendable, but you have to jump through hoops and an extra set of complications to extend it.

I would like to say, one of the attractions of Cheshire is that it has many/most of the features I want (RAG, embeddings choice, use of OpenAI or Local, etc), and all in a GUI. No CLI needed when I want to change settings. With that in mind, have you seen how SillyTavern connects to a running LLM (ooba) with just one click?

Now, Silly Tavern is not my cup of tea, but the ease of connecting to ANY local LLM already running, is amazing. Would you consider making the connection to local LLMs in Cheshire, a little easier? As I said, the main attraction is ease of use in getting LLM+RAG working, keeping in that frame of mind, easier connection to a local LLM would go a long way to seeing greater adoption of the Cat.

It's confusing because out of the box, with an OpenAI key, this is a very accessible GUI for local RAG. But as soon as you want to try a local LLM, it's horribly complicated and beyond any non-professional dev, I dare say.

Thanks

nicola-corbellini commented 8 months ago

Can a section please be added to the docs on using our own local LLMs?

Yes, some tutorials should and could be added in the Tutorials section.

Also, what is the difference between "custom LLM" and "self hosted". Aren't these the same thing? If you're running llama-cpp that's "custom" too?

I guess the main difference is that a "custom" one is an LLM running under a custom endpoint, but it is not necessarily local. Like in this case.

Is it supposed to be going to this link? https://cheshirecat.ai/custom-large-language-model/

Is that link up to date?

Yes, the link was supposed to point to the one you provided. If you don't mind, you could make a PR in the core repository to fix that. Btw, the link is up to date.

Would you consider making the connection to local LLMs in Cheshire, a little easier?

Sorry, but I don't know how extgen-webui (Oobabooga) works and this strongly depends on your needs, but we have a repository with a working setup for local models. How would you like the setup to be?

useful links

ewebgh33 commented 8 months ago

Thanks, easier-to-use local setup would be greatly appreciated. I can't get my head around how to set up the API to talk to the docker container.

I took a look at Local Cat... Olama is Mac and Linux only, neither of which I have running for LLMs, so sadly LocalCat won't work for me. CPU is also slow, so thanks for posting the 2nd link but I would rather not deal with CPU generation. GPU is the way to go for me.

If I wanted to use OpenAI, Cheshire would be incredible. But I really want to use more local models, and at the moment it's just too confusing to try to make that work with Cheshire. I hope someone may take another look at this in future! :) Otherwise I would be telling EVERYONE to use Cheshire...

pieroit commented 8 months ago

@EmmaWebGH are you available to check the TextGenerationWebUI adapter when we integrate it?

ewebgh33 commented 8 months ago

@EmmaWebGH are you available to check the TextGenerationWebUI adapter when we integrate it?

Yes, certainly!

ewebgh33 commented 8 months ago

Hi there Does having this closed, mean textgen-webui is integrated and tested?

The thing I can't work out is the API URL and how to get Cheshire to accept it. Supposedly the textgen-webui API is compatible with the OpenAI API format. So the API it says to use when textgen-webui is launched, is: OpenAI-compatible API URL: http://127.0.0.1:5000

But since Cheshire is in docker I've also tried: http://host.docker.internal:5000/

Now, key? I've seen people say that the textgen-webui "fake" key to use is OPENAI_API_KEY=sk-111111111111111111111111111111111111111111111111 But I don't know where that was documented from, if true. I tried it anyway.

So far no combination of the API URL and key/no-key works for me.

I'm looking into running Ollama via WSL but honestly this is a PITA :) And ability to use APIs generated by a variety of LLM apps would only be a benefit, IMO anyway.

ewebgh33 commented 8 months ago

I have Ollama running is WSL - is it possible for the WSL version to serve LLMs to Cheshire, rather than the docker version? Docs only describe how to use the docker version of Ollama.

pieroit commented 8 months ago

I have Ollama running is WSL - is it possible for the WSL version to serve LLMs to Cheshire, rather than the docker version? Docs only describe how to use the docker version of Ollama.

@EmmaWebGH try this:

launch Ollama on your machine
launch the cat docker
get the IP of your machine
use as Ollama address http://YOUR-IP:11434

ewebgh33 commented 8 months ago

I'll give that a go shortly

The reason I didn't think this would work, is that it's the same exact process/expected behaviour for textgeneration-webui (Ooobabooga) and adding the url/port for that one does nothing. Both are supposed to be compatible with the OpenAPI standard, but the textgeneration-webui API doesn't work, so I didn't expect Ollama to either, and Ollama isn't mentioned in the docs.

I'll give it a go when I get back to the PC and report back. Thanks!

ewebgh33 commented 7 months ago

Sorry to keep this going

Updated the docker for Cheshire this morning.

What I did: I have tried this finally, connecting Cheshire to Ollama via http://127.0.0.1:11434 Cheshire pops up a green box and says Language model provider updated successfully

What happens: I go to ask a simple first question like: Who are you? Cheshire responds with AI: You did not configure a Language Model. Do it in the settings! I also tried http://localhost:11434, and http://localhost:11434/api/generate and http://localhost:11434/api/chat which is what the Ollama api docs said to do.

For all of these, Cheshire sends me a green box in the top right of the browser saying update "successfully". But chat responds no model. Seems like conflicting information. Why this behaviour?

Does the green box come up simply because I added literally anything to it, it's not actually validating the connection I entered? Hmm confirmed, I just added "donuts" and it said it was updated successfully. This seems like a major UI/design flaw because nothing about adding "donuts" is successful.

Also, now that Ollama is an option in the LLM models setup, this is out of date, correct? https://cheshirecat.ai/local-models-with-ollama/

Final note, Cheshire is docker, Ollama is windows via WSL/Ubuntu. I tried with the IP of my WSL and also the IP of my local machine and those don't work either. (ips from basic cmd>ipconfig lists IPs for WSL and Ethernet).

Ollama shows running when browser is directed to: http://127.0.0.1:11434/ (the very first thing I tried as above) so it would imagine it's not Ollama that is the problem here.

Update: Got Ollama in WSL working with CrewAI So I assume my troubles are either to do with the WSL-Ollama/Docker-Cheshire combo or something else.

cheshire-cat-ai / docs

Add section to docs on how to use own LLMs - link in settings goes to 404 #96