Instructions for local models

MikeyBeez commented 1 year ago

Are there any instructions for using local models rather than GPT-3 or 4? Is there a way to set the basepath to 127.0.0.1:11435 to use ollama or to 1234/v2 for LM Studio? Is there a configuration file or environment variables to set for this? Thank you for sharing your wonderful software with the AI community.

github-actions[bot] commented 10 months ago

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

msveshnikov commented 10 months ago

Please, any news here?

yf007 commented 10 months ago

I am also looking for a solution to this problem.

Progaros commented 9 months ago

I was trying to get ollama running with AutoGPT.

curl works:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral:instruct",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
{"id":"chatcmpl-447","object":"chat.completion","created":1707528048,"model":"mistral:instruct","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":" Hello there! I'm here to help answer any questions you might have or assist with tasks you may need assistance with. What can I help you with today?\n\nHere are some things I can do:\n\n1. Answer general knowledge questions\n2. Help with math problems\n3. Set reminders and alarms\n4. Create to-do lists and manage tasks\n5. Provide weather updates\n6. Tell jokes or share interesting facts\n7. Assist with email and calendar management\n8. Play music, set timers for cooking, and more!\n\nLet me know what you need help with and I'll do my best to assist!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":140,"total_tokens":156}}

but with this AutoGPT config:

## OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key)
OPENAI_API_KEY=ollama

## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url
# the following is an example:
OPENAI_API_BASE_URL= http://localhost:11434/v1/chat/completions

## SMART_LLM - Smart language model (Default: gpt-4-0314)
SMART_LLM=mixtral:8x7b-instruct-v0.1-q2_K

## FAST_LLM - Fast language model (Default: gpt-3.5-turbo-16k)
FAST_LLM=mistral:instruct

I can't get the connection:

File "/venv/agpt-9TtSrW0h-py3.10/lib/python3.10/site-packages/openai/_base_client.py", line 919, in _request
    raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

maybe someone will figure it out and can post an update here

msveshnikov commented 9 months ago

Connection is solvable via proxy, but then you will get pydantic errors everywhere because Mistral is producing wrong json

qwertyuu commented 9 months ago

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

ketsapiwiq commented 9 months ago

Hi! I am still fighting with ollama to try proxying an agent on my own, but one important thing I want to mention is regarding this:

Connection is solvable via proxy, but then you will get pydantic errors everywhere because Mistral is producing wrong json

Can't we theoretically code an agent that uses GBNF grammar files for forcing Mistral or other local LLMs to produce correct JSON?

A simple example for correct JSON is viewable in the llama.cpp repo: https://github.com/ggerganov/llama.cpp/blob/master/grammars/json.gbnf Then you include the correct GBNF in your llama.cpp command (I figure it would be a problem if the Ollama API doesn't support it though).

There are even programs now that generate correct GBNF files based on JSON definitions: https://github.com/richardanaya/gbnf

ShrirajHegde commented 9 months ago

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@qwertyuu, I thought ollama supports OpenAI compatible API without LiteLLM (https://ollama.com/blog/openai-compatibility). Am I missing something?

Wladastic commented 9 months ago

I got it to run with Mistral 7B AWQ, neural chat v3 AWQ and a few other models. Only thing is I had to write my own Auto-GPT from scratch as the prompts from Auto-GPT are too long and confusing for the local llms. They return correct prompts sometimes, but other times they concentrate so much on the system Prompt by Auto-GPT that they respond with "Hello, I am using the command ask_user to talk to the user, is this correct?" and then it says "Hello, how can I help you?" like 100 times until I cancel it.

My current use case of using oobabooga text-generation-webui works best when I add the JSON grammar to it. It then works with very basic prompts only and only a few commands, otherwise it kept making up new commands and started halucinating and responding with multiple commands at once etc.

k8si commented 9 months ago

I got it to make calls to a llamafile server running locally (which has an OpenAI-compatible API) by just setting OPENAI_API_BASE_URL=http://localhost:8080/v1 in my .env. I know the requests are getting through based on the debug logs (plus I can see the calls coming into my llamafile server).

However, since the model I'm using doesn't support function calling, the json it returns has null for the tool_calls field which results in ValueError: LLM did not call create_agent function; agent profile creation failed coming from here: https://github.com/Significant-Gravitas/AutoGPT/blob/a9b7b175fff94ab4a9c5d6c91537089199b27a09/autogpts/autogpt/autogpt/agent_factory/profile_generator.py#L202

Also, setting OPENAI_FUNCTIONS=False does not seem to do anything.

If anyone knows of an open source gguf or lllamafile-format model that supports function calling, let me know. That might fix this issue?

Wladastic commented 9 months ago

Well instead of using OPENAI API use one of the numerous API plugins or check the OPENAI Gpt base plugin in the code. I havent got any local model to fully work with Auto-GPT as GPT-4 can hold the context length without getting too focused on it, but other models that work do focus too much on the prompt given to the llm then. Mistral for example keeps talking about the constraints that it gets and that it tries to oblige to them etc. I am currently trying to build something similar to this project that uses multiple agent calls for each step to somehow accommodate the lack of context, but it is a bit slow as sometimes an agent gets very stubborn on their point of view.

cognitivetech commented 8 months ago

https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Wladastic commented 8 months ago

https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Hermes 2 Pro works well but I would rather wait for another Version of this based on Mistral 7B v.0.2 as Hermes 2 Pro is based on v0.1 which is only trained on 8k context length and v0.2 is trained on 32k.

I also think capybara Hermes 2.5 Q8_0 works very well for me, only that it sometimes doesnt understand why a JSON was wrong. Maybe some other LLM would come along that is cleaner than Mistral 7B Instruct v.0.2 as that version is horrible to use currently.

Also set n_batch to 1024 at least, or 2048, this way Auto-GPT runs best so far. Not on par with GPT-3.5-Turbo but it works kindof. The function calling could be implemented from here though: https://github.com/NousResearch/Hermes-Function-Calling

qwertyuu commented 8 months ago

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@qwertyuu, I thought ollama supports OpenAI compatible API without LiteLLM (https://ollama.com/blog/openai-compatibility). Am I missing something?

Damn! Good to know.

ketsapiwiq commented 8 months ago

A bit off-topic but this project gained a lot of traction lately and works with Hermes Pro or Mistral/Mixtral, it doesn't have many agents yet (web search, main planning loop, and RAG) but it works super well, maybe interesting to study: https://github.com/nilsherzig/LLocalSearch

cognitivetech commented 8 months ago

savage

ZhenhuiTang commented 7 months ago

If an openai-ready api is needed, I think you can go through litellm to make a bridge to your ollama instance: https://docs.litellm.ai/docs/providers/ollama

@qwertyuu, I thought ollama supports OpenAI compatible API without LiteLLM (https://ollama.com/blog/openai-compatibility). Am I missing something?

Damn! Good to know.

Have you been using local LLMs with the mentioned compatible API successfully？

Docteur-RS commented 6 months ago

Now that ollama is openai compatible we should be able to trick autoGPT by setting OPENAI_API_BASE_URL=http://localhost:11434/v1.
Unfortunatly there are still 2 issues here:

The model name has to be something that is an existing proprietary model string like "gpt4-turbo" or whatever. So using "mistral:latest" isn't working.
Faking the api-key seems to to hurt autoGPT. I'm not sure what it's checking but "hello world" as API key won't fly.

ntindle commented 6 months ago

Should be pretty simple to add a new provider for ollama by copy pasting open ai and removing parts not needed

Docteur-RS commented 6 months ago

Should be pretty simple to add a new provider for ollama by copy pasting open ai and removing parts not needed

Huuum. I don't even know where is the provider file located.

But let's pretend I could duplicate the provider. Is it really worth it ? I can't find anywhere in the documentation any tips about running tools with local models. And honnestly tool calling is a real must have to achieve anything !

I just feel like autoGPT isn't oriented toward local models support anyway. Considering alternatives like CrewAI and Autogen which both have documentation and local tool calling support might be a better choice for the moment.
I feel like autoGPT is a bit like langgraph. It has an ollama plugin but the ollama tool calling is outdated and never got out of beta. It doesn't feel safe to invest time in this one right now IMO.
All I can read everywhere is OPENAI OPENAI OPENAI OPENAI...

I hope that this project gets better support for running local models soon. It seems nice ;-)

Wladastic commented 6 months ago

Oh I thought this was done already. Was stuck so started reading the Auto-GPT docks when I was like hang on a sec, is this still "Open"AI only? Oh my!

I've been experimenting with both, AutoGen and CrewAI, and function calling is still an issue across the board as far as I'm aware.

I was thinking that maybe we could crowd-fine-tune llama3 on some synthetic text-to-code TTC 😅 example data set and perhaps all AutoGen, CrewAI and other new package documentations.

Would that work? If yes then let's hivemind our GPUs and get it done 🙌

I made a proof of concept version you can try and augment: https://github.com/Wladastic/mini_autogpt

BradKML commented 5 months ago

@Docteur-RS there needs to be an open source tool bag for things like web crawling and math, but are there existing libraries for it? https://github.com/Significant-Gravitas/AutoGPT/issues/6947

Pwuts commented 5 months ago

We're almost done with a llamafile integration (#7091). Ollama could be a good addition too, and should be pretty easy to integrate by subclassing the _BaseOpenAIProvider.

Significant-Gravitas / AutoGPT

Instructions for local models #6336