Support using other/local LLMs

DataBassGit commented 1 year ago

You can modify the code to accept a config file as input, and read the Chosen_Model flag to select the appropriate AI model. Here's an example of how to achieve this:

Create a sample config file named config.ini:

[AI] Chosen_Model = gpt-4

Offload the call_ai_fuction from the ai_functions.py to a separate library. Modify the call_ai_function function to read the model from the config file:

import configparser

def call_ai_function(function, args, description, config_path="config.ini"):
    # Load the configuration file
    config = configparser.ConfigParser()
    config.read(config_path)

    # Get the chosen model from the config file
    model = config.get("AI", "Chosen_Model", fallback="gpt-4")

    # Parse args to comma separated string
    args = ", ".join(args)
    messages = [
        {
            "role": "system",
            "content": f"You are now the following python function: ```# {description}\n{function}```\n\nOnly respond with your `return` value.",
        },
        {"role": "user", "content": args},
    ]

  # Use different AI APIs based on the chosen model
    if model == "gpt-4":
        response = openai.ChatCompletion.create(
            model=model, messages=messages, temperature=0
        )
    elif model == "some_other_api":
        # Add code to call another AI API with the appropriate parameters
        response = some_other_api_call(parameters)
    else:
        raise ValueError(f"Unsupported model: {model}")

    return response.choices[0].message["content"]

In this modified version, the call_ai_function takes an additional parameter config_path which defaults to "config.ini". The function reads the config file, retrieves the Chosen_Model value, and uses it as the model for the OpenAI API call. If the Chosen_Model flag is not found in the config file, it defaults to "gpt-4".

the if/elif structure is used to call different AI APIs based on the chosen model from the configuration file. Replace some_other_api with the name of the API you'd like to use, and replace parameters with the appropriate parameters required by that API. You can extend the if/elif structure to include more AI APIs as needed.

Torantulino commented 1 year ago

Excellent work. Lots of people are asking for this submit a pull request!

In order to fully support gpt3.5 (and other models) we need to harden the prompt.

@Koobah had some success by adding this line to the end of prompt.txt:

Before submitting the response, simulate parsing the response with Python json.loads. Don't submit unless it can be parsed.

This would also help out #21

DataBassGit commented 1 year ago

I see you looking over my shoulder. Thoughts?

I've got a friend who is going to clone the branch and test for me. (hopefully) I don't have a working environment atm.

Koobah commented 1 year ago

I also changed the user prompt from NEXT COMMAND to GENERATE NEXT COMMAND JSON. Basically reminding it to use JSON whenever possible It's still not 100% working, though.

DataBassGit commented 1 year ago

https://github.com/DataBassGit/Auto-GPT/blob/master/scripts/ai_function_lib.py

@Koobah This is basically what I'm working with atm. I think we can probably add a verify_json function to a gpt-3.5 segment of that function.

DataBassGit commented 1 year ago

Pull request on this is submitted. I'm going to start looking into more models and platforms that can be incorporated.

ResourceHog commented 1 year ago

Would be huge if it can run llama.cpp locally.

DataBassGit commented 1 year ago

I might be able to swing that. Let's see if this merge gets approved. I'm also looking at implementing GPT4all.

MarkSchmidty commented 1 year ago

@DataBassGit I see that PR got closed. What's the status of your fork?

DataBassGit commented 1 year ago

They moved the API call to GPT-4 to an external library in main.py, but there are still some scripts that call openai directly, like chat.py, browse.py etc.

GPT4all doesn't support x64 architecture. I also tried some APIs on hugging face, but it seems that it truncates responses on the free API endpoints.

I just converted BabyAGI to Oobabooga, but it's untested. Should be getting to that tonight. If it works, I will start working on porting AutoGPT to Oobabooga as well. The nice thing about this method is that it allows for local or remote hosting, and can handle many different language models without much issue.

Hugging face should work, though. I need to review Microsoft/Jarvis. They make heavy use of HF APIs.

MarkSchmidty commented 1 year ago

GPT4all supports x64 and every architecture llama.cpp supports, which is every architecture (even non-POSIX, and webassemly). Their moto is "Can it run ~Doom~ LLaMA" for a reason.

Ooga supports GPT4all (and all llama.cpp ggml models), since it packages llama.cpp and the llamacpp python bindings library. So porting it to ooba would effectively resolve this.

DataBassGit commented 1 year ago

The Python Client for gpt4all only supports x86 Linux and ARM Mac.

MarkSchmidty commented 1 year ago

I'm running it on x64 right now.

MarkSchmidty commented 1 year ago

To be clear, the x86 architecture for gpt4all should really be called x86/x64. It supports either.

But none of the gpt4all libraries are required to run inference with gpt4all. They have their own fork to load the pre-prompt automatically. But you can load the same pre-prompt with one click in ooba's UI with standard llama.cpp and the regular llama.cpp python bindings as a back-end. You don't need anything but the model .bin and ooba's webui repo.

alkeryn commented 1 year ago

wouldn't it be simpler to just make an api call to ooba's gui instead of managing the loading of models ? it may be easier to just have a standardized api so you don't have to care about implementation details.

MarkSchmidty commented 1 year ago

Ooba's UI is a lot of overhead just to send and receive requests from a different model.

AFAIK ooba supports two types of models, HuggingFace models and GGML (llama.cpp) models (like GPT4All). The former with HuggingFace libraries and the latter with these python bindings: https://github.com/thomasantony/llamacpp-python for llama.cpp

Adding even basic support for just one of these would surely bring in waves of developers who want local models and who would then contribute to improving Auto-GPT.

alkeryn commented 1 year ago

@MarkSchmidty i see what you mean, though, maybe it would be simpler to make a separate project that exposes a standardized api and maybe some extensibility through plugins and not much else, so that other projects can just use the api without having to care about how to implement the various models and techniques.

either way in the long run i think it may be better if we have a standard api that was well thought out, just like language servers made our editors nicer, it would be nice to have a llm or even ai standardized api.

if we could avoid fragmentation that'd be great and there is no better time than now to do so.

alkeryn commented 1 year ago

well in the meantime i think i'll fork it to use llama instead, i got gpt4 access but i like the idea of being able to let it run for very long without worrying about cost or api overuse.

MarkSchmidty commented 1 year ago

well in the meantime i think i'll fork it to use llama instead, i got gpt4 access but i like the idea of being able to let it run for very long without worrying about cost or api overuse.

I think a lot of people want this but just don't know it yet. There are lots of interesting use cases which would wrack up a huge OpenAI bill that LLaMA-30B or 65B can probably handle fine for just the cost to power a 150watt $200 Nvidia Tesla P40.

DataBassGit commented 1 year ago

There's an issue with this. Auto-GPT relies on specifically structured prompts in order to function correctly. Llama does not do well at providing prompts structured in the exact format that is required. Vicuna does a much better job. It's not perfect, but could probably get there with some fine tuning.

I have a fork of an older version of Auto-GPT that I am planning to hook up to vicuna. Right now, I am waiting on Oobabooga to fix a bug in their API. I've been working with BabyAGI at the moment because it is simpler than AutoGPT. Once BabyAGI is working, I will migrate the changes to AutoGPT as well.

MarkSchmidty commented 1 year ago

With minimal finetuning LLaMA can easily do better (yes better*) than GPT-4. Finetuning goes a long way and LLaMA is a very capable base model. The Vicuna dataset (ShareGPT) is available for finetuning here: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/tree/main/HTML_cleaned_raw_dataset

The ideal finetuning would be based on a dataset of GPT-4's interactions with Auto-GPT though.

*To be fair, GPT-4 could do better than it already does "out of the box" with a few tweaks like using embeddings, but that is besides the point.

DataBassGit commented 1 year ago

@MarkSchmidty We absolutely could port the prompts and responses from autogpt to file and use that for fine tuning vicuna. I don't have a GPU, however, so I'm not able to perform the operation, and I don't have the money for gpt-4.

alkeryn commented 1 year ago

@DataBassGit yes this is what i found during trying to implement it, and that's before the pinecone update, after the pinecone update there is an additional use of openai to generate embbedings, which would also need to be made differently.

DataBassGit commented 1 year ago

@alkeryn I managed to perform this using sentence_transformers library. This appears to work for Vicuna and pinecone, but you have to change your index dimensions from 1536 to 768 on pinecone. I think the model dictates the index dimensions. I couldn't find a way to adjust the dimensions otherwise.



model = SentenceTransformer('sentence-transformers/LaBSE')

def get_ada_embedding(text):
    # Get the embedding for the given text
    embedding = model.encode([text])
    return embedding[0]```

MarkSchmidty commented 1 year ago

@alkeryn I managed to perform this using sentence_transformers library. This appears to work for Vicuna and pinecone, but you have to change your index dimensions from 1536 to 768 on pinecone. I think the model dictates the index dimensions. I couldn't find a way to adjust the dimensions otherwise.
model = SentenceTransformer('sentence-transformers/LaBSE')

def get_ada_embedding(text):
    # Get the embedding for the given text
    embedding = model.encode([text])
    return embedding[0]```

Awesome!

There are offline embedding replacements for pinecone that might be more ideal. For example, https://github.com/wawawario2/long_term_memory is a fork of ooba which produces and stores embeddings locally using zarr and Numpy. See https://github.com/wawawario2/long_term_memory#how-it-works-behind-the-scenes

DataBassGit commented 1 year ago

Unfortunately, that would take a lot of chopping to apply to what I am using it for. This is designed for the webui, which I am not using. We're loading ooba in API mode so no --chat or --cai-chat flag.

python server.py --auto-devices --listen --no-stream

This is how I am initiating the server.

MarkSchmidty commented 1 year ago

Right, it would have to be re-implemented specifically for Auto-GPT. I just thought I'd point out that it is a future possibility.

I suppose local embeddings is a separate issue / feature we can look into.

DataBassGit commented 1 year ago

I don't expect that @Torantulino will implement local anything. That opens a window for lots of bugs and extra tech debt that he doesn't need. My intention was only to add the capacity for others to replace the api library with one of their own choosing, with the understanding that it's not supported. Thus offloading the api calls to a separate library would give us the ability to build an API interface for whatever we needed, without him needing to support it.

MarkSchmidty commented 1 year ago

Neither of us can read @Torantulino's mind.

But if you're right and people want this functionality as much as I suspect they do, either a wave of enthusiastic support for it will sway his mind or the interest will turn into a fork more capable than Auto-GPT (due to the benefits outlined in #348).

drusepth commented 1 year ago

Been following this thread while I implement local models in babyagi, but just wanted to pop in and voice my desire to see local models in this project, too. OpenAI is easy to use and implement, but local models have huge benefits in price and and customization which seem paramount to optimize for in projects like these.

MarkSchmidty commented 1 year ago

Been following this thread while I implement local models in babyagi, but just wanted to pop in and voice my desire to see local models in this project, too. OpenAI is easy to use and implement, but local models have huge benefits in price and and customization which seem paramount to optimize for in projects like these.

Local embeddings have these benefits and more as well.

If you like the sound of that, check out my meta-feature request at for #348 Fully Local/Offline Auto-GPT and give it a boost. :)

Torantulino commented 1 year ago

I like the idea of running it offline too, we're looking into it! It would make Auto-GPT that much more accessible.

Thanks for the outstanding interest.

9cento commented 1 year ago

+1

DataBassGit commented 1 year ago

I've been working on making an offline port for BabyAGI because it's a much simpler model. One of the issues with offline ports is that each model has a different input format for interpretation. It can also change based on the host api interface. Does it take plain text? JSON?

You also have different token windows per model and different dimensions per memory index. I'm not sure that building a universal agi that can interface with many different models without having to rebuild from scratch is feasible.

MarkSchmidty commented 1 year ago

There area already OpenAI API server clones for local models like LLaMA/Alpaca/Vicuna/GPT4All. All you need to do is change the endpoint to point at one. It's very much feasible.

DataBassGit commented 1 year ago

@MarkSchmidty I've been looking for one. We've tried llama.cpp, oobabooga, and huggingface. Is there another option you have working with vicuna?

alreadydone commented 1 year ago

Maybe this? Maybe I have seen one with more stars but I don't remember where.

9cento commented 1 year ago

Are we talking local server or? Otherwise it does make the thing nonsensical to begin with. Also another layer (local or not) just adds more latency to the output, theoretically

On Mon, Apr 10, 2023, 05:09 Mark Schmidt @.***> wrote:

There area already OpenAI API server clones for local models like LLaMA/Alpaca/Vicuna/GPT4All. All you need to do is change the endpoint and context window size. That is not only feasible, it's not even a tall order.

— Reply to this email directly, view it on GitHub https://github.com/Torantulino/Auto-GPT/issues/25#issuecomment-1501335475, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMMMJUOKX43A2VVPELLEUM3XAN2VTANCNFSM6AAAAAAWQJHYHE . You are receiving this because you commented.Message ID: @.***>

MarkSchmidty commented 1 year ago

@MarkSchmidty I've been looking for one. We've tried llama.cpp, oobabooga, and huggingface. Is there another option you have working with vicuna?

https://github.com/hyperonym/basaran works with all HuggingFace format GPU models. "Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models."

https://github.com/alexanderatallah/Alpaca-Turbo is a local server implementation of the OpenAI API for Alpaca using alpaca.cpp which can be easily modified for other LLaMA based models. It's a drop-in replacement endpoint for the OpenAI API anywhere the OpenAI API works like basaran.

Are we talking local server or?

Yes

Maybe this? Maybe I have seen one with more stars but I don't remember where.

That looks similar to alpaca-turbo.

9cento commented 1 year ago

Well at least it's something to cope with for the moment, I'll look into it

On Mon, Apr 10, 2023, 09:02 Mark Schmidt @.***> wrote:

@MarkSchmidty https://github.com/MarkSchmidty I've been looking for one. We've tried llama.cpp, oobabooga, and huggingface. Is there another option you have working with vicuna?

https://github.com/alexanderatallah/Alpaca-Turbo is a local server implementation of the OpenAI API for Alpaca which can be easily modified for other LLaMA based models. It's a drop-in replacement endpoint for the OpenAI API anywhere the OpenAI API works.

Are we talking local server or? Yes

— Reply to this email directly, view it on GitHub https://github.com/Torantulino/Auto-GPT/issues/25#issuecomment-1501475708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMMMJUMZSYGU7KAJZT5SCT3XAOV7JANCNFSM6AAAAAAWQJHYHE . You are receiving this because you commented.Message ID: @.***>

DataBassGit commented 1 year ago

Nice. All three of these look promising. I'll test sometime this week.

st4s1k commented 1 year ago

Would be huge if it can run llama.cpp locally.

also train it (fine tune it) in parallel with communicating with it (for locally installed LLMs)
having backup checkpoints

0xgeert commented 1 year ago

Just to add one: Drop in open API replacement as a frontend for Llama CPP. With 400+ stars atm, seems to be one of the more popular:

https://github.com/abetlen/llama-cpp-python

alkeryn commented 1 year ago

@MarkSchmidty wait that's great, if it works we can basically run it on local models with no modifications whatsoever.

drusepth commented 1 year ago

Of note, Basaran didn't support Llama-based models until today, but today's release finally adds support for them.

peakji commented 1 year ago

Basaran's author here, thanks for recommending our project!

Compared to other LLaMA-specific projects, the advantage of integrating Basaran is that it allows users to freely choose various models from the past and future, not just limited to LLaMA variants. We believe this is particularly important for non-English speaking users, as currently LLaMA is mainly trained on Latin languages.

For compatibility, currently you only need to modify one environment variable to migrate from the OpenAI text completion API to Basaran, without modifying a single line of code. In the next few weeks, we will also add message template support for models that have undergone instruction tuning, making them fully compatible with the OpenAI Chat API.

9cento commented 1 year ago

Basaran's author here, thanks for recommending our project!

Compared to other LLaMA-specific projects, the advantage of integrating Basaran is that it allows users to freely choose various models from the past and future, not just limited to LLaMA variants. We believe this is particularly important for non-English speaking users, as currently LLaMA is mainly trained on Latin languages.

For compatibility, currently you only need to modify one environment variable to migrate from the OpenAI text completion API to Basaran, without modifying a single line of code. In the next few weeks, we will also add message template support for models that have undergone instruction tuning, making them fully compatible with the OpenAI Chat API.

BASEDRAN

PurpleStarChild commented 1 year ago

I'm looking at doing something similar to this to make it so oobabooga API is used for accessing a local LLM. I am mostly looking at creating a proof of concept integration, but I'm still learning about the overall code structure of AutoGPT.

Could someone point me in the direction of the key areas that would need to be touched in order to do this. Or alternatively the areasthat make direct calls to openAI so I know where to start?

MarkSchmidty commented 1 year ago

I'm looking at doing something similar to this to make it so oobabooga API is used for accessing a local LLM. I am mostly looking at creating a proof of concept integration, but I'm still learning about the overall code structure of AutoGPT.

Could someone point me in the direction of the key areas that would need to be touched in order to do this. Or alternatively the areasthat make direct calls to openAI so I know where to start?

Oogabooga's API is not OpenAI Completions API compatible so it will not work with Auto-GPT. Basarn is a local OpenAI Completions API server for local models. You'll want to use that instead. https://github.com/hyperonym/basaran

PurpleStarChild commented 1 year ago

Ahh thank you! I'll give that a try.

keldenl commented 1 year ago

i made a drop-in replacement for openai via llama.cpp (had some issues with the python binding llama-cpp-python). almost got it working with autogpt, just need to fully support same-dimension embeddings, if anybody has any pointers pls lmk, i got auto-gpt working for a couple cycles but it's really inconsistent

https://github.com/keldenl/gpt-llama.cpp

Significant-Gravitas / AutoGPT

Support using other/local LLMs #25