Does AutoGPT really work

KittHaab commented 1 year ago

Summary 💡

I like AutoGPT from a technical point, but have tested versions 0,3,0 and 0.4.0 tens of times and never reached the expected result.

Examples 🌈

Example 1 Asked (AutoGPT v0.4.0 via Docker and Gpt3.5) to search for private primary care clinics in Nairobi AND specifically asked to exclude all hospitals. As a result 15 "yes'es" AutoGPT tried to google, then gave up, looked through yellow pages and listed 2 private clinics and 98 women's hospital branches (most of them outside Nairobi).

And the filtering task was not even finalised as I got "openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 8686 tokens. Please reduce the length of the messages."

Example 2 Made a second try and made the task (goal) even sharper "Please find private clinics (excluding hospitals) offering primary care services in Nairobi". Although the search was slightly better targeted it still ended up focusing on hospitals.

And even that inquiry ended after 25 "yes'es" with "openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4850 tokens. Please reduce the length of the messages."

Motivation 🔦

I like the idea of AutoGPT and I'm not whining, but I'd like to hear feedback from other users - does AutoGPT work for your queries?

Does it work with simple queries or are you done with complex queries as well?

Maybe a little suggestion: If the task includes words "exclude something" (like hospitals in my case) how the google search came out so broad - {'query': 'private clinics in Nairobi offering primary care services'} and not {'query': 'private clinics in Nairobi offering primary care services -hospitals'}. That could have avoided tens of unnecessary queries

ScarletPan commented 1 year ago

Hi, our R&D team is working on benchmarking Auto-GPT and other AI agents to evaluate their performance in various situations. We hope that the upcoming benchmark results will address this issue and help improve Auto-GPT's performance. I'll also take note of the two examples you mentioned.

pdenya commented 1 year ago

I would love to see a (growing) list of real world applications linked from the readme. At this point i'm extremely interested to see how anyone has been successful with this repo, no matter the use case so that I can improve my attempts. Thanks!

Boostrix commented 1 year ago

Consider looking at challenges.

Also, most folks consider 0.2.2 to be the last version to work really well for them.

Bontah commented 1 year ago

so far i tried with every version to create a mealplan to my specific needs without any luck. it always get's stuck in a stupid way either by asking google or by "creating" some files.

wyang22 commented 1 year ago

Yes, I want to ask the same question, the idea of autoGPT is very excciting and promosing, but unluckily it seems it never generated the expected result, for the various tasks I ever tried. Does anybody have any successful examples we can try? It will be beneficial to list the guideline or what kind of task is better to assign to it, to avoid many useless money and time consuming tryings.

moonlockwood commented 1 year ago

I think that approaching this as a brand new field (the fact that llms that can do stuff like this) and not a final goal helps. This is the very beginning of all this and autogpt is a very nice approach to figuring out how it could all work and evolve. There are so many things to get right as a whole, not just autogpt. A real, powerful, unified multi-modal embedding space is what I'm excited for. For me, it helps to remember that llms think in meaning, not words, and everything is tokens.

wyang22 commented 1 year ago

I think that approaching this as a brand new field (the fact that llms that can do stuff like this) and not a final goal helps. This is the very beginning of all this and autogpt is a very nice approach to figuring out how it could all work and evolve. There are so many things to get right as a whole, not just autogpt. A real, powerful, unified multi-modal embedding space is what I'm excited for. For me, it helps to remember that llms think in meaning, not words, and everything is tokens.

Good statement and totally agree, that is why I am interested in this as well. But again, let's come to reality first, does this really work? any examples? From my observation, there are still many engineering improvements, but what is not clear to me is, if it is not working, not sure it is because of the engineering aspect, or because the models like ChatGPT is not there for this yet.

Boostrix commented 1 year ago

It's obviously working for some of us rather well - that is why some of us teamed up to improve things even more.

If it's not working for others, that's usually a matter of poor prompting and stating poor/too sophisticated objectives in the first place.

Given the nature of the beast, it should go without saying that people with a coding background probably have a much easier time to coax the app into doing something that they want

kjongdae commented 1 year ago

I started independent development because I got a realization of the world while watching YouTube videos with version 0.2.2 for fun, and I immediately thought I could succeed with it, so I delved into studying artificial intelligence. However, collecting too many opinions from people led to conflicting views, and I became tired of tweaking and using already released versions. So, I decided to start developing autonomously. It's regrettable that I couldn't participate in the great open-source history, but if I wait any longer, I feel like I won't be able to keep up with the passage of time.

kjongdae commented 1 year ago

import os
import subprocess
from pathlib import Path

from autogpt.agent.agent import Agent
from autogpt.commands.command import command
from autogpt.config import Config
from autogpt.logs import logger
from autogpt.setup import CFG
from autogpt.workspace.workspace import Workspace

@command(
    "execute_python_code",
    "Create a Python file and execute it",
    '"code": "<code>", "basename": "<basename>"',
)
def execute_python_code(code: str, basename: str, agent: Agent) -> str:
    """Create and execute a Python file and return the STDOUT of the executed code.
    If there is any data that needs to be captured, use a print statement.

    Args:
        code (str): The Python code to run
        basename (str): A name to be given to the Python file

    Returns:
        str: The STDOUT captured from the code when it ran
    """
    ai_name = agent.ai_name
    directory = os.path.join(agent.config.workspace_path, ai_name, "executed_code")
    os.makedirs(directory, exist_ok=True)

    if not basename.endswith(".py"):
        basename = basename + ".py"

    path = os.path.join(directory, basename)

    try:
        with open(path, "w+", encoding="utf-8") as f:
            f.write(code)

        return execute_python_file(f.name, agent)
    except Exception as e:
        return f"Error: {str(e)}"

@command("execute_python_file", "Execute Python File", '"filename": "<filename>"')
def execute_python_file(filename: str, agent: Agent) -> str:
    """Execute a Python file and return the output.

    Args:
        filename (str): The name of the file to execute

    Returns:
        str: The output of the file
    """
    logger.info(
        f"Executing python file '{filename}' in working directory '{CFG.workspace_path}'"
    )

    if not filename.endswith(".py"):
        return "Error: Invalid file type. Only .py files are allowed."

    workspace = Workspace(
        agent.config.workspace_path, agent.config.restrict_to_workspace
    )

    path = workspace.get_path(filename)
    if not path.is_file():
        return (
            f"python: can't open file '{filename}': [Errno 2] No such file or directory"
        )

    try:
        result = subprocess.run(
            ["python3", str(path)],
            capture_output=True,
            encoding="utf8",
            cwd=CFG.workspace_path,
        )
        if result.returncode == 0:
            return result.stdout
        else:
            return f"Error: {result.stderr}"
    except Exception as e:
        return f"Error: {str(e)}"

@command(
    "execute_shell",
    "Execute Shell Command, non-interactive commands only",
    '"command_line": "<command_line>"',
)
def execute_shell(command_line: str, agent: Agent) -> str:
    """Execute a shell command and return the output.

    Args:
        command_line (str): The command line to execute

    Returns:
        str: The output of the command
    """
    try:
        result = subprocess.run(
            command_line,
            shell=True,
            capture_output=True,
            text=True,
            cwd=CFG.workspace_path,
        )
        if result.returncode == 0:
            return result.stdout
        else:
            return f"Error: {result.stderr}"
    except Exception as e:
        return f"Error: {str(e)}"

def validate_command(command: str, config: Config) -> bool:
    """Validate a command to ensure it is allowed.

    Args:
        command (str): The command to validate

    Returns:
        bool: True if the command is allowed, False otherwise
    """
    if not command:
        return False

    command_name = command.split()[0]

    if config.shell_command_control == "allowlist":
        return command_name in config.shell_allowlist
    else:
        return command_name not in config.shell_denylist

This is my "exute_code.py"

Here is a simple modification that completely removes the constraints on Docker and shell commands:

This modification probably started imposing restrictions since version 0.2.2 because it became increasingly risky. However, if you try running it on a completely independent AWS Lightsail instance, you can experience the feeling of the old days.

Please be extremely cautious. It could potentially break the system entirely, so it is not advisable to use it in interactive mode.

github-actions[bot] commented 1 year ago

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

github-actions[bot] commented 1 year ago

This issue was closed automatically because it has been stale for 10 days with no activity.

Significant-Gravitas / AutoGPT