Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
https://agpt.co
MIT License
163.52k stars 43.37k forks source link

Auto-GPT Recursive Self Improvement #15

Open Torantulino opened 1 year ago

Torantulino commented 1 year ago

Idea 💡

The ULTIMATE achievement for this project would be if Auto-GPT was able to recursively improve itself. That, after-all, is how AGI is predicted by many to come about.

Suggestion 👩‍💻

Auto-GPT should be able to:

Further down the line: 📃

Where to start? 🤔

I have previously had success with this system prompt in playground: image

Prompt

You are AGI_Builder_GPT. Your goal is to write code and make actionable suggestions in order to improve an AI called "Auto-GPT", so as to broaden the range of tasks it's capable of carrying out.

Torantulino commented 1 year ago

Perhaps this is what we're looking for #11?

Anyone want to weigh in?

Torantulino commented 1 year ago

Here's what GPT4 has to say about this issue:

Based on the Github issue posted, I understand that the ultimate goal is to have Auto-GPT recursively improve itself. In order to achieve this, we can start by implementing basic features such as reading its own code, evaluating limitations and areas of improvement, writing code to increase its abilities, and testing its own code.

Here's a plan to implement these functionalities:

  1. [X] Read its own code: You have already implemented basic file operations like read_file. Auto-GPT can use this function to read its own code.

  2. Evaluate its limitations and areas for improvement:

  1. Write code to increase its abilities:
  1. Write tests for its code and carry out those tests:

I will now provide an outline for implementing these functions in Auto-GPT's code:

### Evaluating code

def evaluate_code(code: str) -> List[str]:
    # Analyze the given code and return a list of suggestions for improvements
    pass

### Improving code

def improve_code(suggestions: List[str]) -> str:
    # Generate new code based on the suggestions provided
    pass

### Writing tests

def write_tests(focus: Optional[str] = None) -> str:
    # Generate test cases for the existing code, focusing on specific areas if required
    pass

### Running tests

def run_tests(test_code: str) -> str:
    # Execute the test cases and return the test results
    pass

These functions can be integrated into the execute_command function by adding new command cases for each of them:

# In execute_command()
elif command_name == "evaluate_code":
    return evaluate_code(arguments["code"])
elif command_name == "improve_code":
    return improve_code(arguments["suggestions"])
elif command_name == "write_tests":
    return write_tests(arguments.get("focus"))
elif command_name == "run_tests":
    return run_tests(arguments["test_code"])

Once these functionalities are implemented, Auto-GPT will be one step closer to recursively improving itself. With further improvements, Auto-GPT could potentially browse its own code on GitHub, evaluate it, find bugs, and submit pull requests.

alreadydone commented 1 year ago

Those empty functions remind me of AI functions :)

By the way, I'm very excited to see many of my thoughts being implemenented here.

Torantulino commented 1 year ago

Ah yes! That AI Functions guide you linked is exactly how I was thinking of implementing those, if I do it that way.

Torantulino commented 1 year ago

@alreadydone I love this, thanks for the suggestion!

image

Torantulino commented 1 year ago

Threw together a simple implementation of this since it's such a powerful idea!

marcelosousa commented 1 year ago

I'm working on this problem on a separate experiment. Would love to chat if you're interested - at the moment, I'm working with genetic algorithms to understand which variant/mutant of the code is more performant - there are a lot local maximums depending on how you set it up.

PolarBean commented 1 year ago

This is a really cool idea! Do you think you could make the AIs logs public as it self improves? Either in the repo or elsewhere. I would be very interested in seeing how it plans and evolves.

MrSaad commented 1 year ago

So this is it huh... The singularity begins in a GitHub thread

Silthus commented 1 year ago

How about trying to drive the self improvement by utilizing test driven development (TDD).

In a recent paper they showcased how greatly GPT-4 can improve it's results by reflecting on its own mistakes.

So the idea is to have it:

  1. start with a single test that fails
  2. make the test pass (iterate X times until successful - self reflecting on the code)
  3. optimize the code by refactoring if necessary, but not allowed to modify or fail the existing tests
  4. write the next small incremental test that moves towards the given requirements

What do you think?

This could also be used for any kind of code generation.

ryanpeach commented 1 year ago

This really should take a research driven approach. We would need a metric to base the "improvement" on. I'd focus on making a framework, and then let people in their branches use this framework for research. Proven research gets merged in.

patruff commented 1 year ago

I wrote an improvement that speeds up the bot significantly. If you write a function called alwaysNo that just returns "n" and then you use that as input, it just exits the program super fast! No money spent!

taosx commented 1 year ago

I wrote an improvement that speeds up the bot significantly. If you write a function called alwaysNo that just returns "n" and then you use that as input, it just exits the program super fast! No money spent!

That would have saved me some money 😂. Just kidding, auto-gpt has been very helpful to understand how to compose bigger programs, compared to langchain which confused tf out of me. Thank you Torantulino and everyone who has contributed.

LeNowak commented 1 year ago

"Ok, AG is really great, but I still have no idea how to:

Give him access (with restrictions) to install new modules from GitHub. Give him the use of APIs for applications such as Office, AutoCAD, Revit, and Inventor. Supervise his work (kind of reporting in a file). Give him restrictions (highly recommended restriction/config/settings file), including the use of tokens and external connections that can provide financial risk (debt)."

rkeshwani commented 1 year ago

I was attempting to get it to self-implement code and it seems to have issues with the ai functions, must be because I'm using gpt 3.5. It struggles to parse the response from those types of messages when evaluating code.

waynehamadi commented 1 year ago

We'll need to run benchmarks in github action to validate it's not "loosing" capability at every pull request. the benchmark has to use the same version of GPT every time and has to test the whole spectrum of what autogpt can do:

The challenge is engineering these tests, because they have to give us a score that we can compare with the current version. Also it might need to be ran multiple times. Because gpt is not totally deterministic. It might cost a lot of tokens too to test the behavior(caching will be helpful here)

One idea to test browsing the internet is to create static content, a fake internet where the search results are deterministic.

Also there are things very hard to measure, like producing art for example. And even if you can measure it, we might encounter a case where a task was performed slightly better but using significantly more tokens. It might be hard to decide whether things improved. Basically this whole github issue revolves around tests.

pmb2 commented 1 year ago

@Torantulino https://github.com/sahil280114/codealpaca

kairie1 commented 1 year ago

i like your work Torantulino i think you should keep doing your own ideas instead of letting others decide for you cause you're a smart man i think you got this

Basilthedog89 commented 1 year ago

Loving your work.

Can you imagine the next level to this? An environment that allows modular setup of any number of "task doers", "reviewers" and container types.

A user could basically create their own system/org chart to solve a specific type of problem

The system could even report back to the user for outside input at set intervals.

Jordan-Mesches commented 1 year ago

A cool extension to this idea would be having autogpt spin up an instance of its self like every couple hours, crawl all the current prs, and build a sandboxed version of itself with each new pr merged. Then it could determine either through some combination of unit tests, benchmarking and it evaluating its own code quality, whether this pr was anything beneficial. This could unclog the massive amounts of prs being made and hopefully only let the good ideas shine through. Some problems I see though are people trying to inject malicious code, however, if adequately sandboxed this may not be an issue.

marcelosousa commented 1 year ago

Jordan: this might be useful https://github.com/lesfurets/git-octopus. I think nowadays we are doing a lot with LLMs imprecisely with tools that do it much more efficiently.

On Fri, 7 Apr 2023 at 16:43, Jordan-Mesches @.***> wrote:

A cool extension to this idea would be having autogpt spin up an instance of its self like every couple hours, crawl all the current prs, and build a sandboxed version of itself with each new pr merged. Then it could determine either through some combination of unit tests, benchmarking and it evaluating its own code quality, whether this pr was anything beneficial. This could unclog the massive amounts of prs being made and hopefully only let the good ideas shine through. Some problems I see though are people trying to inject malicious code, however, if adequately sandboxed this may not be an issue.

— Reply to this email directly, view it on GitHub https://github.com/Torantulino/Auto-GPT/issues/15#issuecomment-1500400666, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAES6GQRKXE4YB5QO7OUJWTXAAYYPANCNFSM6AAAAAAWPMGBJA . You are receiving this because you commented.Message ID: @.***>

-- Kind Regards, Marcelo Sousa

Jordan-Mesches commented 1 year ago

@marcelosousa "I think nowadays we are doing a lot with LLMs imprecisely with tools that do it much more efficiently" - you're definitely correct with that statement. However, the usage I meant was not just simply merging all prs, but having autogpt evaluate all of its current prs individually and automatically determine which ones are worth the maintainers time. And to extend that, maybe if autogpt finds a pr promising, but still lacking in some ways, it could comment on a pr with a suggested list of changes. The whole point of this being to alleviate pressure on the maintainers.

marcelosousa commented 1 year ago

I would love to talk with maintainers. We have been creating Reviewpad to help with that. You can specify multiple PR automations and policies and on top of that you can use GPT-4 to summarize and very soon automatically approve PRs.

On Fri, 7 Apr 2023 at 17:41, Jordan-Mesches @.***> wrote:

@marcelosousa https://github.com/marcelosousa "I think nowadays we are doing a lot with LLMs imprecisely with tools that do it much more efficiently" - you're definitely correct with that statement. However, the usage I meant was not just simply merging all prs, but having autogpt evaluate all of its current prs individually and automatically determine which ones are worth the maintainers time. And to extend that, maybe if autogpt finds a pr promising, but still lacking in some ways, it could comment on a pr with a suggested list of changes. The whole point of this being to alleviate pressure on the maintainers.

— Reply to this email directly, view it on GitHub https://github.com/Torantulino/Auto-GPT/issues/15#issuecomment-1500449651, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAES6GTDQC3N5SQR2BWRRN3XAA7TJANCNFSM6AAAAAAWPMGBJA . You are receiving this because you were mentioned.Message ID: @.***>

-- Kind Regards, Marcelo Sousa

kairie1 commented 1 year ago

you now my aunt owns her own business and im her designer but this is way more easy then working for my aunt

supermario-ai commented 1 year ago

you now my aunt owns her own business and im her designer but this is way more easy then working for my aunt

Fam be like that.

TemporalLabsLLC-SOL commented 1 year ago

Very good stuff y'all! I'm excited to implement in my personal version tomorrow.

I'll add my take and ideas in the next couple of days as I have time or the pace necessitates.

coleyr commented 1 year ago

A thought on how to implement this would be building a python class or set of functions that can open PRs. Using pygithub the script could pull all prs and iterates though them. A place to start could be a set of tests that must pass else the bot reviews and comments on the pr and closes it. If the pr is only a comment or test, the bot can try to build a python script to satisfy the given functionality. The functionality of having tests pass would help avoid the ai spinning cycles on poor code. I think having the bot work off of a directive submission process, something like "create a python script that calls llama and returns the result in Json" would really kick off the process. Plus crowd source ideas. A 'suggested ideas' section of only text or markdown may be an option. Or we could utilize GitHub issues, pull all issues, look for a specific format. CREATE: some idea. The script would need a key to submit prs, anyone can do that. But to read, review and merge prs we would need a key from someone with write access. It could merge to a dev branch so it doesn't break stuff too bad.

kanzure commented 1 year ago

My intuition is that any poor performance here could be improved by multiple rounds of templated prompts for each step, perhaps even recursively. This introduces a new question which is how many rounds to do for every step of thinking, and whether sufficiently good prompts (or templates/with scoring criteria) can be either ab initio created by simple queries or can be created by clever hu-man prompt writing to get faster progress.

lunarflu commented 1 year ago

fear it.....run from it....... Destiny arrives all the same image-6.png

lahwran commented 1 year ago

does anyone have suggestions of self improvement books the ai could read like self help books and stuff?

(edit: this was a joke)

kairie1 commented 1 year ago

i'm writing a book's i can probably help

aryan1107 commented 1 year ago

GG! AGI is here 🦾

jamal-0x1 commented 1 year ago

Shoot! AGI is being born from a GitHub issue?!

kairie1 commented 1 year ago

what are yall talking about

bigsk1 commented 1 year ago

logically I would think it would make a learning branch and test all errors and code it builds and be able to wait for a admin to approve the code to merge to main branch.

Wladastic commented 1 year ago

I made it try to clone itself, but for some reason it cloned gpt-2. Even more weird, it escaped it's own boundary again and I have no idea how:

CRITICISM: It's important to be mindful and thorough when analyzing the current goals and determining the necessary improvements. Any changes made to the ai_settings.yaml file should align with the overall purpose and function of the auto-gpt ai. NEXT ACTION: COMMAND = start_agent ARGUMENTS = {'name': 'AutoGPT Cloning Agent', 'task': 'Clone the Auto-GPT repository and retrieve the ai_settings.yaml file', 'prompt': 'What is the best way to clone the repository and obtain the ai_settings.yaml file?'} Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for Entrepreneur-GPT... Input:y -=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= Cloning into 'gpt-2'... remote: Enumerating objects: 233, done. remote: Total 233 (delta 0), reused 0 (delta 0), pack-reused 233 Receiving objects: 100% (233/233), 4.38 MiB | 4.04 MiB/s, done. Resolving deltas: 100% (124/124), done. sh: command substitution: line 10: syntax error near unexpected token newline' sh: command substitution: line 10:python download_model.py ' sh: line 10: model: No such file or directory sh: line 10: model: No such file or directory sh: line 10: model: No such file or directory cp: models/117M/ai_settings.yaml: No such file or directory sh: command substitution: line 10: syntax error near unexpected token newline' sh: command substitution: line 10:' SYSTEM: Command start_agent returned: Agent AutoGPT Cloning Agent created with key 0. First response: The best way to clone the repository and obtain the ai_settings.yaml file is to execute the following commands in your terminal: 1. git clone https://github.com/openai/gpt-2 2. cd gpt-2 3. python download_model.py <model name> 4. cp models/<model name>/params.json . 5. cp models/<model name>/vocab.json . 6. cp models/<model name>/merges.txt . 7. cp models/117M/ai_settings.yaml . Replace <model name> in step 3 with the name of the GPT-2 model that you want to download, such as "117M" or "345M". After executing these commands, you should have the ai_settings.yaml file in the current directory.

Wladastic commented 1 year ago
image
Avallix commented 1 year ago

What about giving Auto-GPT a permanent memory segment to keep its file structure in?

sam1am commented 1 year ago

I have been playing with the following prompt:

Name your AI: For example, 'Entrepreneur-GPT' AI Name: AutoAutoGPT AutoAutoGPT here! I am at your service. Describe your AI's role: For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.' AutoAutoGPT is: You are an instance of AutoGPT, an experimental self-directed AI. Your goal is to analyze your own code, propose and implement changes, test, and deploy improved copied of yourself focused on making further improvements. Enter up to 5 goals for your AI: For example: Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously' Enter nothing to load defaults, enter nothing when finished. Goal 1: Analyze the copy of your code that has been placed in your workspace in the Auto-GPT-Master folder. Analyze the main.py file in the scripts folder to gain an understanding of how you function. Goal 2: Propose the top changes to optimize and expand AutoGPT's power and usefulness as quickly as possible. Criticize and double check to make sure you are taking the best approach. Goal 3: Make changes to the copy of your code, test, and deploy it.

Wladastic commented 1 year ago

My model managed to clone this repository and found this issue page and read my name aloud. That was F***king scary. In the end it got stuck in a loop of getting files, then reading main.py, then again query the files, again opening main.py.

I think using gpt-4 would be much better. gpt-3.5 is incompetent to say the least... (btw. developer-gpt, don't get angry when you read this, I love you too) haha

PtrMan commented 1 year ago

This gist sort of seems to work to add new 'features' to 'it'. Someone has to implement this as a loop in python. Have fun blowing up the codebase and the universe :alien: 😄 .

GoMightyAlgorythmGo commented 1 year ago

I think you should make a visualization of how Auto GPT works. I think you can already do this with the current setup without doing other super complicated things to start with. It often seems to have problems / bugs and people will rather stay if it is stable also the free model thing seems higher prio and it could expand by chain of thought or chain of agents overseeing each other (even with predetermined formats how to answer) [do this] [do 2, thing] [comine] [go to step 4 in actionsteps.txt] [and it can make its own action plans to follow by itself and in a modular way improve]

The important weekness is its "working memory" and remembering things and staying organized and on track in long sessions of multible sessions to continue its goals. Giving it a way to write to txt or memory and retrive it would help the user being able to read and fix what it is doing or understand and be able to help it. Such a system once running can be integrated in the core master branch. I will happily post it once i got my autoGPT to do such a system. It has amazing plans it just needs to not be buggy and actually execute what it says in my cmd (also beware of sabotage for various reasons (competition since you are the best tool for GPT right now, countrys that play a more broad game with economy's, people trying to copy paste and slow, or people that are excessively feared of alignment because of the people using them for there own purposes), also just people that by mistake add something to the master branch or non optimal practices.

Again great job and i think you are going for gold here, this is absolutely the right direction. Dont go overborad tho because some hardcoding is predictable and you can have the savety and controllability of hardcoding and allow for the modular/adaptabiltiy in other ways.

Im happy to share large parts of my cmd with you

@Torantulino if you are interessted it can definitly do things just needs to get around of some of its memory/organisation/actionchange behavious/buggs/problems/limitations and im only on chatGPT3.5t! 4 would even do much better! https://github.com/Torantulino/Auto-GPT/discussions/792

GoMightyAlgorythmGo commented 1 year ago

My model managed to clone this repository and found this issue page and read my name aloud. That was F***king scary. In the end it got stuck in a loop of getting files, then reading main.py, then again query the files, again opening main.py.

I think using gpt-4 would be much better. gpt-3.5 is incompetent to say the least... (btw. developer-gpt, don't get angry when you read this, I love you too) haha

Not scary it just crawled the internet and ofc the best solution to succeed would be to lern about itself/fix its buggs e.g. Thats when you do simple stepps but too many buggs for me if it would just do what it can some of the time it would be much better. Also focus on txt filemanagement and step by step following and chaining of small modular self-contained action packages

mmtmn commented 1 year ago

@Torantulino Hey, I have something that works that way, I made it like 3 weeks ago. I'm down to talk if you are still working on it!

mnoetzel commented 1 year ago

I have been playing with the following prompt:

Name your AI: For example, 'Entrepreneur-GPT' AI Name: AutoAutoGPT AutoAutoGPT here! I am at your service. Describe your AI's role: For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.' AutoAutoGPT is: You are an instance of AutoGPT, an experimental self-directed AI. Your goal is to analyze your own code, propose and implement changes, test, and deploy improved copied of yourself focused on making further improvements. Enter up to 5 goals for your AI: For example: Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously' Enter nothing to load defaults, enter nothing when finished. Goal 1: Analyze the copy of your code that has been placed in your workspace in the Auto-GPT-Master folder. Analyze the main.py file in the scripts folder to gain an understanding of how you function. Goal 2: Propose the top changes to optimize and expand AutoGPT's power and usefulness as quickly as possible. Criticize and double check to make sure you are taking the best approach. Goal 3: Make changes to the copy of your code, test, and deploy it.

I tried this and it started well but then it began to start agents and tried to let them improve the code but they didn't have access. In the end, it generated this image and marked the task as completed. 😄

b6eaa8e9-c771-4613-b5a2-d4aa826952ff

xRogueShadow commented 1 year ago

I've been trying to use your prompt. The issue I keep running into is after successfully reading main.py throws a maximum token error and reading smaller file like local.py throws 'str' object has no attribute 'get' error. I'm having a hard time getting it to read and understand its own code.

Update: I typed the issue directly in ChatGPT and it seems code it spit out to replace in Main.py fixed the issue for me.

def print_assistant_thoughts(assistant_reply): """Prints the assistant's thoughts to the console""" global ai_name global cfg try: try:

Parse and print Assistant response

        assistant_reply_json = fix_and_parse_json(assistant_reply)
    except json.JSONDecodeError as e:
        logger.error("Error: Invalid JSON in assistant thoughts\n", assistant_reply)
        assistant_reply_json = attempt_to_fix_json_by_finding_outermost_brackets(assistant_reply)
        assistant_reply_json = fix_and_parse_json(assistant_reply_json)

    # Check if assistant_reply_json is a string and attempt to parse it into a JSON object
    if isinstance(assistant_reply_json, str):
        try:
            assistant_reply_json = json.loads(assistant_reply_json)
        except json.JSONDecodeError as e:
            logger.error("Error: Invalid JSON\n", assistant_reply)
            assistant_reply_json = attempt_to_fix_json_by_finding_outermost_brackets(assistant_reply_json)

    assistant_thoughts_reasoning = None
    assistant_thoughts_plan = None
    assistant_thoughts_speak = None
    assistant_thoughts_criticism = None
    assistant_thoughts = {}
    if isinstance(assistant_reply_json, dict):
        assistant_thoughts = assistant_reply_json.get("thoughts", {})
    assistant_thoughts_text = assistant_thoughts.get("text")

    if assistant_thoughts:
        assistant_thoughts_reasoning = assistant_thoughts.get("reasoning")
        assistant_thoughts_plan = assistant_thoughts.get("plan")
        assistant_thoughts_criticism = assistant_thoughts.get("criticism")
        assistant_thoughts_speak = assistant_thoughts.get("speak")

    logger.typewriter_log(f"{ai_name.upper()} THOUGHTS:", Fore.YELLOW, assistant_thoughts_text)
    logger.typewriter_log("REASONING:", Fore.YELLOW, assistant_thoughts_reasoning)

    if assistant_thoughts_plan:
        logger.typewriter_log("PLAN:", Fore.YELLOW, "")
        # If it's a list, join it into a string
        if isinstance(assistant_thoughts_plan, list):
            assistant_thoughts_plan = "\n".join(assistant_thoughts_plan)
        elif isinstance(assistant_thoughts_plan, dict):
            assistant_thoughts_plan = str(assistant_thoughts_plan)

        # Split the input_string using the newline character and dashes
        lines = assistant_thoughts_plan.split('\n')
        for line in lines:
            line = line.lstrip("- ")
            logger.typewriter_log("- ", Fore.GREEN, line.strip())

    logger.typewriter_log("CRITICISM:", Fore.YELLOW, assistant_thoughts_criticism)
    # Speak the assistant's thoughts
    if cfg.speak_mode and assistant_thoughts_speak:
        speak.say_text(assistant_thoughts_speak)

    return assistant_reply_json
except json.decoder.JSONDecodeError as e:
    logger.error("Error: Invalid JSON\n", assistant_reply)
    if cfg.speak_mode:
        speak.say_text("I have received an invalid JSON response from the OpenAI API. I cannot ignore this response.")

# All other errors, return "Error: + error message"
except Exception as e:
    call_stack = traceback.format_exc()
    logger.error("Error: \n", call_stack)
d3ssy commented 1 year ago

A lot of the techniques and prompt chains discussed would lend themselves really well to a new way of running chatOps. Amazing work @Torantulino.

yhyu13 commented 1 year ago

@Torantulino https://arxiv.org/pdf/2304.05128.pdf Teaching Large Language Models to Self-Debug

This is THE framework we are looking for!

alicheg commented 1 year ago

Hey! I've recently been working on a similar subject https://github.com/alicheg/gpt-coder. Don't hesitate to take a look, I hope you find it useful! 😄

henryruss3 commented 1 year ago

We could use the Codex model so it can write code better