OPENAI_API_BASE Support

kenfink commented 1 year ago

Add support for the OPENAI_API_BASE endpoint environment variable. Ideally add input for "OpenAI API Endpoint" in GUI TopBar / Settings under "Open-AI API Key"

This is important for NOW because it will allow us to point to any OpenAI API Compatible drop-in.
Use-case examples: ChatGPT-to-API for those of us that don't have GPT4 API access or want to use the Plus membership instead of per-token costs for 3.5-Turbo. llama-cpp-python provides a drop-in OpenAPI compatible API endpoint. Oobabooga provides an OpenAPI compatible API endpoint plugin.

Realizing that in the future this project may likely have direct support for all sorts of local models and various APIs, this will enable a lot of flexible testing until then.

Related Feature request: Each agent should have its own OPENAI_API_KEY and OPENAI_API_BASE. This may already be baked into the plans for enabling various LLMs, since each will have various settings. But here's a currently useful use case: Agent 1 points at localhost:PORT1 for Gorilla, key="model-name", Agent 2 points at localhost:PORT2 for StarCoder, key="other-model", Agent 3 points at api.openai.com for paid Inference.

kenfink commented 1 year ago

Here's a workaround that seems to work for now. Totally inappropriate for future growth so I'm not creating a PR for this. But as they say, a stupid idea that works isn't stupid.

in /llms/openai.py Line 23 is: openai.api_key = api_key add line 24 beneath it: openai.api_base = api_key

Then in the web UI make the OpenAI API Key http://YOUR.HOST.IP:PORT/v1 Be sure to leave off the slash at the end of the endpoint - the backend server adds it back in.

arkkita commented 1 year ago

I tried this and it doesn't work, it just keeps "thinking" for extremely long with no response. Msg me on discord Kita#7214

sirajperson commented 1 year ago

I'm still having trouble running the project, but I thought a simple solution would be:

 def __init__(self, api_key, image_model=None, model="gpt-4", temperature=0.6, max_tokens=4032, top_p=1,
             frequency_penalty=0,
             presence_penalty=0, number_of_results=1):

    openai.api_base = os.getenv("OPENAI_API_BASE", default="https://api.openai.com/v1")

Then one could simply export an environmental variable to set the URL base path, or if there isn't a variable set then it uses the default URL.

sirajperson commented 1 year ago

I added a static method to the AgentExecutor class: superagi/jobs/agent_executor.py: line 91

   @staticmethod
   def get_model_api_base_url():
        base_url = get_config("OPENAI_API_BASE_URL")
        # shell_url = os.getenv("OPENAI_API_BASE_URL")
        return base_url

Then updated the OpenAI class' initialization function to include the base_url parameter class: superagi/llms/openai.py: line 11

     def __init__(self, api_key, api_base="https://api.openai.com/v1", image_model=None, model="gpt-4", temperature=0.6, max_tokens=4032, top_p=1,
                 frequency_penalty=0,
                 presence_penalty=0, number_of_results=1):
        openai.api_base = api_base

Then when calling the executor agent, just add in the parameter: agent_executor.py: about line 151

 spawned_agent = SuperAgi(ai_name=parsed_config["name"], ai_role=parsed_config["description"],
                               llm=OpenAi(api_base=AgentExecutor.get_model_api_base_url(), model=parsed_config["model"], api_key=model_api_key), tools=tools, memory=memory,
                               agent_config=parsed_config)

Finally in the config.yaml file line 5:

OPENAI_API_BASE_URL: https://api.openai.com/v1

I'm still trying to run the project, but that's an improvement because now one can set a setting in the config.yaml file called OPENAI_API_BASE_URL to change the target url of the openai api.

sirajperson commented 1 year ago

Okay, I got to work with text-generation-web ui. The solution is a bit hacky atm, but the agent is using a local ggml model that is being executed off of multiple GPUs. The above solution definitely works. In order to get it to work with TGWUI though I had to make the openai api run on my computer's LAN interface as getting the docker image to access port 5001 on host machines loop back interface was hard. To make text generation web ui run on the LAN interface I edited the extensions/openai/script.py file and added the following:

Below the import statements, line 17 I added:

ipsocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
ipsocket.connect(("8.8.8.8", 80))
localip = ipsocket.getsockname()[0]

This creates a variable containing the host machine's primary network interfaces' ip address. On line or around line 762 you will find the following line:

server_addr = ('0.0.0.0' if shared.args.listen else '127.0.0.1', params['port']) Change it to use the ip variable instead of the static string 127.0.0.1:

server_addr = ('0.0.0.0' if shared.args.listen else localip, params['port'])

Please note that this is a quick and dirty solution to use local LLMs. A much better solution would be to add llama-cpp-python functionality to the app, and to create a settings interface for use with llama-cpp-python. I guess I can work on that next. For now though, that is a quick way to get SuperAGI to use local LLMs.

sirajperson commented 1 year ago

Okay, using the text generation webui seems to be running into errors parsing JSON from SuperAGI. Going to need to do some more investigating here. This error though is off topic of this issue. As it stands, using local LLMs can be done by editing the aforementioned files in the SuperAGI project.

I added a static method to the AgentExecutor class: superagi/jobs/agent_executor.py: line 91
   @staticmethod
   def get_model_api_base_url():
        base_url = get_config("OPENAI_API_BASE_URL")
        # shell_url = os.getenv("OPENAI_API_BASE_URL")
        return base_url
Then updated the OpenAI class' initialization function to include the base_url parameter class: superagi/llms/openai.py: line 11
  def __init__(self, api_key, api_base="https://api.openai.com/v1", image_model=None, model="gpt-4", temperature=0.6, max_tokens=4032, top_p=1,
              frequency_penalty=0,
              presence_penalty=0, number_of_results=1):
     openai.api_base = api_base
Then when calling the executor agent, just add in the parameter: agent_executor.py: about line 151
 spawned_agent = SuperAgi(ai_name=parsed_config["name"], ai_role=parsed_config["description"],
                               llm=OpenAi(api_base=AgentExecutor.get_model_api_base_url(), model=parsed_config["model"], api_key=model_api_key), tools=tools, memory=memory,
                               agent_config=parsed_config)
Finally in the config.yaml file line 5:
OPENAI_API_BASE_URL: https://api.openai.com/v1 
I'm still trying to run the project, but that's an improvement because now one can set a setting in the config.yaml file called OPENAI_API_BASE_URL to change the target url of the openai api.

sirajperson commented 1 year ago

Okay, I created a PR with a merging of Text Generation Web UI to manage local hosted language models. This PR creates a docker image for TGWUI, and adds settings to use it in the configuration file. Local LLMs are a go!

kenfink commented 1 year ago

Here's another option: The Fastchat folks published this today https://lmsys.org/blog/2023-06-09-api-server/

alexkreidler commented 1 year ago

I'd recommend we stick with the name OPENAI_API_BASE rather than OPENAI_API_BASE_URL because the former is the standard for Langchain.

sirajperson commented 1 year ago

Absolutely, I'll post that change on my next commit.

alexkreidler commented 1 year ago

Maybe it would be worth opening a separate smaller PR than #289 so people can use this Base URL change sooner? I'm happy to do that. I just applied @sirajperson 's patches from https://github.com/TransformerOptimus/SuperAGI/issues/243#issuecomment-1583275247 locally and they work great!

sirajperson commented 1 year ago

@alexkreidler On my fork I have began to implement locally run LLMs. The fork is currently under development, and is not ready to be merged yet. It would be great if you could create a separate PR. Thanks for the help!

sirajperson commented 1 year ago

Please consider the following: add Django to the end of the requirements.txt file:

Django==4.2.2

Add an import statement to line 6 of agent_executor.py:

from django.core.validators import URLValidator
from django.core.exceptions import ValidationError

Allow the get_agent_api() method to validate the supplied URL or return the default OpenAI API base.

   @staticmethod
   def get_agent_api_base():
        base_url = get_config("OPENAI_API_BASE")
        # shell_url = os.getenv("OPENAI_API_BASE")

        url_validator = URLValidator(verify_exists=False)
        try:
            url_validator(base_url)
        except ValidationError as e:
            return "https://api.openai.com/v1"

        return base_url

and finally modify the function name when called in the execute_next_action function on or around line 160 to:

        spawned_agent = SuperAgi(ai_name=parsed_config["name"], ai_role=parsed_config["description"],
                                 llm=OpenAi(api_base=AgentExecutor.get_agent_api_base(), model=parsed_config["model"], api_key=model_api_key), tools=tools, memory=memory,
                                 agent_config=parsed_config)

The following lines in the OpenAI class in openai.py can remain the same:

  def __init__(self, api_key, api_base="https://api.openai.com/v1", image_model=None, model="gpt-4", temperature=0.6, max_tokens=4032, top_p=1,
              frequency_penalty=0,
              presence_penalty=0, number_of_results=1):
     openai.api_base = api_base

This will make the use of a custom base URL more robust.

eriksonssilva commented 1 year ago

@sirajperson Hello Jonathan! I hope you're doing well. I'm sorry for using this one instead of creating a new one, but I have searched high and low and can't find a solution... I have tried using Super AGI+ OogaBooga on the backend with the dockerized version by Atinoda, as you've pointed out here

However, no matter what I do -> If I build the image cloning their repo or if I copy the "text-generation-webui" an try building the image locally, I always get the same error on Super AGI powershell:

"(host='super__tgwui', port=5001): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7f4de27160>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))"

If I use the command "Test-NetConnection 127.0.0.1 -p 5001" on powershell, it returns true, so it means the port is open.

I have even uninstalled docker and installed again, but I'm still facing this... Do you have any Idea of what I might be doing wrong?

Thanks a lot!

sirajperson commented 1 year ago

@eriksonssilva If you place the models that you would like to use under:

SuperAGI/tgui/config/models/

They will be copied to the container to be used in tgui. Presently only GPTQ models with a context length greater than 4096 tokens are working.

The line "(host='super__tgwui', port=5001): tells SuperAGI to use the docker bridge and host name resolution. In order to get GPU support you will need to follow the docker instructions for setting up the target machine to use the docker image. Those instructions can be found here

eriksonssilva commented 1 year ago

@sirajperson Thanks for the quick answer! So I've been messing around here and I KINDA made it "work"... I am using Oogabooga but without docker... Basically I've used the openai extension and after a lot of trial and error, using my IPV4 instead of localhost or 127.0.0.1, it stopped giving this error. However now, something weirder happens... When I start the Super AGI agent, it will only repeat the same thing over and over... I have used the models that are available on the link you mentioned, and I can chat with the model through the webui without issues. On the other hand, the Super AGI does not go anywhere: The output shows things like: "Exception: When loading characters/instruction-following/None.yaml: FileNotFoundError(2, 'No such file or directory')"

"Warning: Loaded default instruction-following template for model. Warning: Ignoring max_new_tokens (3250), too large for the remaining context. Remaining tokens: 1168 Warning: Set max_new_tokens = 1168"

And on Super AGI the answers are always "vague".

Also, each command takes A LOT of time...

As a means of test I've set the goal: "List 10 mind-boggling movies" and the instructions "Use google to find the movies.".

This might not be 100% related to Super AGI, could you (or perhaps anyone?) give me any hint?

sirajperson commented 1 year ago

@eriksonssilva Bravo Erik. That's definitely progress. What model are you using? Also, are you using llama-cpp to off load layers to your GPU? I can tell you that the some of the llama model's are just not that great yet. I have been working on trying to get MPT-30B as the brain behind the dropin api end point because it's instruct capabilities are starting to really deliver the quality responses that would make it usable.

eriksonssilva commented 1 year ago

@sirajperson If I try using MTP30b I think my computer will standup and walk away from the room! "Nah, dude. You're expecting too much from me" lol I have tried with llama-7b-4bit and TheBloke_open-llama-7b-open-instruct-GPTQ, but both produce similar results. I can only use llamma.cpp on the "llama-7b-4bit". For some reason the other one does not allow me to use it... It's funny that not even GPT3.5 turbo is giving me satisfying results (for more complex tasks), but I must admit that each time I refresh the "usage page" and I see that the cost is increasing, I start sweating! haha

sirajperson commented 1 year ago

@alexkreidler Yeah, those models don't have a very high perplexity score. If you are able to use the gptq models you should try mpt 30b gptq. What I've been doing, because my two old M40s can't run GPTQ models, is renting cheaper gpu instances at runpod.io and running them there. But please try out the MPT30b and share your results. Also be aware that MPT30b has special message termination characters. That will have to be configured in the constraints section of the agent.

sirajperson commented 1 year ago

@alexkreidler This also happened on the 20th. So it may be possible to use this to inference MPT ggml based models from the GPU: https://postgresml.org/blog/announcing-gptq-and-ggml-quantized-llm-support-for-huggingface-transformers

sirajperson commented 1 year ago

The discussion on this issue is lingering from being able to use a different end point for the the API to how to get a local LLM working for task agents. Please refer to #542 for discussions on task agent functionality.

As it stands, the use of a different API endpoint is working correctly. Because one can point the task agent to any API endpoint weather or not the endpoint selected will work with the task agent is beyond the scope of this issue. I'm hoping this issue will be closed soon, since the alternative endpoint improvement is working great.

sirajperson commented 1 year ago

@TransformerOptimus I was wondering if you could close this problem since the OPENAI_API_BASE implementation is working without fault.

neelayan7 commented 1 year ago

Awesome @sirajperson . Closing this.

5c4lar commented 1 year ago

Can we further support configure this in the web GUI? restarting the app whenever you want to change the OPENAI_API_BASE is very time consuming.

jmikedupont2 commented 1 year ago

please add better documentation for this in the help

TransformerOptimus / SuperAGI

OPENAI_API_BASE Support #243