Auto-GPT Recursive Self Improvement

Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

https://agpt.co

MIT License

163.75k stars 43.5k forks source link

Auto-GPT Recursive Self Improvement #15

Open Torantulino opened 1 year ago

Torantulino commented 1 year ago

Idea 💡

The ULTIMATE achievement for this project would be if Auto-GPT was able to recursively improve itself. That, after-all, is how AGI is predicted by many to come about.

Suggestion 👩‍💻

Auto-GPT should be able to:

[ ] Read it's own code
[ ] Evaluate it's limitations and areas for improvement
[ ] Write code to increase it's abilities
[ ] Write tests for it's code and carry out those tests

Further down the line: 📃

[ ] Browse it's own code on GitHub
[ ] Evaluate, find bugs, etc
[ ] Submit pull requests

Where to start? 🤔

I have previously had success with this system prompt in playground:

Prompt

You are AGI_Builder_GPT. Your goal is to write code and make actionable suggestions in order to improve an AI called "Auto-GPT", so as to broaden the range of tasks it's capable of carrying out.

Emasoft commented 1 year ago

This simple python script for GPT4 seems to work better than Auto-GPT at debugging and automatically fixing python code. Maybe it could be useful to study it: SebRinnebach autodebug

philliplastcall commented 1 year ago

Hey?!?! What's the point of having memory if it has no meta data to determine its actual relevance? pine cone is just storing everything flat and spitting out the most common memories as relevant. No wonder it keeps getting stuck doing the same shit over and over.

Boostrix commented 1 year ago

The ULTIMATE achievement for this project would be if Auto-GPT was able to recursively improve itself.

nice idea, but I suppose the first step would be teaching Auto-GPT to write/develop and maintain some of its own plug-ins ? That way, the task would be much more self-contained and isolated, and one could use a repository of plug-ins for training purposes, in conjunction with textual descriptions / documentation.

Being able to extend Auto-GPT by telling it to write/adapt plug-ins, could be a good step in the right direction. In fact, there could be a custom command just for that - so that people could interactively select a certain mode during startup and execute that command.

This way, all sorts of custom context could be pre-generated, i.e. directory of plug-ins, directory of docs, directory of examples / APIs etc.

This sort of feature would probably cause a plethora of plug-ins to be generated within just a manner of weeks

Obviously, being able to version plug-ins and to encode mutual dependencies / compatibility issues, would be pretty important with this sort of functionality, especially if plug-ins could be chained together arbitrarily

I think what we need is a brain-loop function. I heard that from an interview with Lex Friedman. In programming a loop is needed to execute repeatetly tasks. The tasks in the loop can be dynamically changed. I know, we have this while-loop in the application.

What I mean is a loop in the task-management to align and self-correct the task-list. Later on, we can let the AI have access to write plugins for AutoGPT, which can be activated by the machine self. Then we are at meta-programming.

Some folks already mentioned planning/tasking and scheduling issues - and in fact, with this degree of sophistication, you could just as well use actual Makefiles (or ninja projects) behind the scenes - which would mean that dependencies would be neatly encoded that way, and you'd get concurrency, too

zachary-kaelan commented 1 year ago

@Boostrix I suggested something similar in #36. Basically a repo of Python files, and a vector database for descriptions of those files as well as various public Python libraries.

But with the way plugins are implemented, it would actually be pretty damn easy to get AutoGPT to develop a plugin and then add it to itself at runtime. The optimal way to do it I think would be to take the template and convert it to a more compact pseudocode representation. You would give that compact template to GPT along with a description of the plugin and ask it to generate pseudocode or text descriptions of all the template and implementation functions, before then iterating over those descriptions and having it properly implement each in turn.

Boostrix commented 1 year ago

this could in fact be a dedicated plugin in and of itself - but like you said, with read-only access to the directory of plug-ins available locally to use it as reference, and in combination with a template plug-in that contains stubs that are to be filled in incrementally. Like you said, descriptions would be useful to to make head and tails of the code inside plugins, but it might work to simply expect/require plug-ins to contain python-style doc strings and use those for the AI part to adapt plugin fragments as needed.

Context-wise, it would obviously help enormously if plugins/their descriptions could contain URLs to the corresponding APIs/documentation and references - that way, there would always be surrounding context for each plugin and its functionality (functions/methods) that could be resolved/looked up if necessary.

Alternatively, the "meta plugin" (to develop plugins) should ask for a bunch of related articles/resources or API references that can be used to look up required APIs and code snippets

In turn, however, if a plugin/function is lacking docs, Auto-GPT could obviously try to annotate the corresponding code to provide sufficient context for a later stage (even if only in-memory)

zachary-kaelan commented 1 year ago

@Boostrix

I gave GPT-4 this lengthy prompt to see how well it could recreate the email plugin, as a test. It got surprisingly close on the first try. Here is the full response. It didn't even require any API docs.

The total template size was 1,067 tokens and the full prompt was 1,746 tokens total. Its response was 1,220 tokens. So simpler plugins could potentially be created all at once within Davinci or GPT-3.5's context window as well.

With some modifications, we may be able to straight-up throw that prompt into GPT-4 and get it to write up the AutoGPTPluginPlugin.

Boostrix commented 1 year ago

there are probably several possibilities to get around context window restrictions - either like you said by coming up with a DSL/domain specific language for the creation of such plugins (which in itself could be an autogpt-dsl-plugin) or alternatively something that would split up a plugin across several self-contained files, so that the context window would not be an issue

The first step would probably be, using Auto-GPT to manage its own github project (via the github API), as per: https://github.com/Significant-Gravitas/Auto-GPT/issues/1870#issuecomment-1527959646

That way, pull requests and feature requests could be better triaged/reviewed and managed. And some contributors could even use Auto-GPT to implement custom functionality/plug-ins - as long as these fit into the 8k context window obviously

zachary-kaelan commented 1 year ago

It would probably need a "code documentation" vector database that contains embeddings for stuff in the AutoGPT source code, everything in the code it writes that is outside of the context window, and maybe various things from API docs. Then something vaguely similar to tooltips and autocompletion is added, where queries are run against the vector database every word and the top few matches are added to the context window. The database is compressed and packaged with the plugin for use by agents doing code reviews.

Boostrix commented 1 year ago

FWIW, I just mentioned this in another discussion here: When I edited git_operations.py to add a few more functions, it seems that the integration is somewhat redundant - it could probably figure out the details just by looking at the doc strings of the various commands: http://codepad.org/xmmmwFWp

remriel commented 1 year ago

Is it possible to add a default folder in the workspace and a function so that autogpt will always read the data from it? For things like API keys, context, maybe PDF files? This can be loaded into memory automatically without needing to run any other commands and include in the documentation for example, "You can add files into the 'Context' folder and AutoGPT will know to read from it when it needs an API key or more context for an action."

Or alternatively add to the .env file a "Misc" API keys section that AutoGPT can read from

It seems to be hit or miss instructing it to read a text file in its workspace in the prompt, and it would be better to have something more reliable that's built in.

Boostrix commented 1 year ago

It would probably need a "code documentation" vector database that contains embeddings for stuff in the AutoGPT source code, everything in the code it writes that is outside of the context window, and maybe various things from API docs.

I did tinker with this today, and at least in principle it seems to be quite capable of creating new commands and plugins by adapting existing ones - for instance, adapting git_operations to add more operations would work "as is" with sufficient context. And at least for the time being, you need to create those modified files inside the workspace and copy them manually to the proper location and restart Auto-GPT. Next, I tried to use the same git_operations.py file to bootstrap a simple svn/hg integration, but that failed immediately - however, it seems that this might be a question of providing different/proper contextual info.

Overall, it really doesn't seem that far-fetched to use some of its sources as templates (ideally in conjunction with some docs) and then bootstrap templates for commands and plugins. Also, to be on the safe side, it would make sense to base such a feature of unit tests, i.e. come up with a test first for each command/plugin and then take it from there by incrementally executing all tests, evaluating the error messages/warnings and adapting the template as needed to get rid of all remaining issues.

As mentioned in the other issue, I extended the git_operations.py file to also init/create repos to be able to track each change, so see what is going on after the fact - and it's actually coming up with plausible changes AND commit messages.

willguest commented 1 year ago

I can almost guarantee that OpenAI is working on something like this since, as stated at the beginning, this is a well-known route to AGI and their stated purpose, for many years, is getting there.

It seems like a macro recording function would be a useful extension of the memory system as the machine needs to build modular workflows that would save time when repeated. In my own efforts I am looking at having more control over memory functions in an attempt to create a kind of recollection function that is based on summaries of certain memory clusters or events. By getting it to give a 1-sentence summary attached to each event/cluster/macro would allow for thematic analysis, potentially connecting previously unrelated tasks, simply because they have a summary that is suggestive. This is faulty, of course, but could be used for speculation or inference.

If I were to organise efforts on a grander scale (which I don't yet have time for) I would look at which resources could be adapted to each stage of the CI pipeline, including those that involve humans, and seeing how to represent these in terms of auto-gpt's current debate structure, which I love. Just my two pennies.

Boostrix commented 1 year ago

It seems like a macro recording function would be a useful extension of the memory system as the machine needs to build modular workflows that would save time when repeated.

Indeed, the idea of introducing a dedicated concept for "workflows" would sound rather powerful - initially, we could share working workflows contributed by the community, over time these could be used for learning purposes to come up with combinations of existing ones, and new ones created from scratch.

As long as these are implementeed via some form of plain text / DSL (or just JSON), the LLM can learn from the data assuming that we can keep these under 8k (context window size restriction for now)

stukennedy commented 1 year ago

I've been developing my own Auto-GPT in a much simpler way, been getting some great results with it. Here's a couple of takeaways I have that may improve Auto-GPT.

I introduced a 2-brain concept. Left-brain, Right-brain. I used different temperature, presence_penalty and frequency_penalty variables to reflect the right (creative/optimistic etc) and the left (corrective/strict and rule-bearing). Each brain can take the messages thread and call the chat API, but has its own System command to prime it. I instructed them to work together to solve a common problem and told them their characteristics. This worked powerfully well.
I always run continuous but tell GPT to ask me questions when it needs input or clarification or direction (as a command that triggers input(question) ). This would be an excellent and simple addition to Auto-GPT, which I'll raise a PR for. It saves time with the auto thread trying to solve problems down the wrong track when it knows it needs direction.

Boostrix commented 1 year ago

Indeed, that sounds like a powerful mechanism.

Would you be able to share a simple example, like some sort of goal setting and the way its accomplishing this ? We could then use the same example to see what Auto-GPT comes up with and compare things.

In fact, I extended Auto-GPT locally to make it ponder whether a question/task is redundant because it is already solved (which frequently happens) by extending its "CRITICISM" mechanism a little. For the time being, there are frequently loops where the agent is trying to do something over and over again that it had already solved. Having a "critical" brain hemisphere to make it skip those steps, would also be interesting.

For this to work in the context of AutoGPT we'd need the equivalent of two independent agents, and possibly a third one evaluating their "conversations".

Thank you !

philliplastcall commented 1 year ago

I've been developing my own Auto-GPT in a much simpler way, been getting some great results with it. Here's a couple of takeaways I have that may improve Auto-GPT.

1. I introduced a 2-brain concept. Left-brain, Right-brain. I used different  temperature, presence_penalty and frequency_penalty variables to reflect the right (creative/optimistic etc) and the left (corrective/strict and rule-bearing). Each brain can take the messages thread and call the chat API, but has its own System command to prime it. I instructed them to work together to solve a common problem and told them their characteristics. This worked powerfully well.

2. I always run continuous but tell GPT to ask me questions when it needs input or clarification or direction (as a command that triggers input(question) ). This would be an excellent and simple addition to Auto-GPT, which I'll raise a PR for. It saves time with the auto thread trying to solve problems down the wrong track when it knows it needs direction.

I'd like to see this

philliplastcall commented 1 year ago

Indeed, that sounds like a powerful mechanism.

Would you be able to share a simple example, like some sort of goal setting and the way its accomplishing this ? We could then use the same example to see what Auto-GPT comes up with and compare things.

In fact, I extended Auto-GPT locally to make it ponder whether a question/task is redundant because it is already solved (which frequently happens) by extending its "CRITICISM" mechanism a little. For the time being, there are frequently loops where the agent is trying to do something over and over again that it had already solved. Having a "critical" brain hemisphere to make it skip those steps, would also be interesting.

For this to work in the context of AutoGPT we'd need the equivalent of two independent agents, and possibly a third one evaluating their "conversations".

Thank you !

Having a memory with operating context would solve a lot of this as well

Boostrix commented 1 year ago

Auto-GPT can already fire up other agents - so maybe, a starting point would be to try the left/right brain thing with two sub-agents, and then let the top-level agent decide how to pursue its goals

Boostrix commented 1 year ago

I did tinker with this today, and at least in principle it seems to be quite capable of creating new commands and plugins by adapting existing ones - for instance, adapting git_operations to add more operations would work "as is" with sufficient context.

turns out, with very little hand-holding, Auto-GPT is even capable of creating scripts based on reading the official github API docs to access github issues in a scripted fashion, as per: https://github.com/Significant-Gravitas/Auto-GPT/issues/1454#issuecomment-1509474337

In other words, if people are feeling this might be worthwhile to pursue, it's relatively straightforward to get Auto-GPT to parse the github API docs https://docs.github.com/en/rest/quickstart?apiVersion=2022-11-28&tool=curl and come up with curl snippets to access project specific issues, which could in turn be turned into dedicated commands or rather a "github" plugin.

The following is what it created for me locally in under 60 seconds inside the workspace: curl --request GET \ --url "https://api.github.com/repos/Significant-Gravitas/Auto-GPT/issues" \ --header "Accept: application/vnd.github+json" \ --header "Authorization: Bearer $(cat github.token)"

(The assumption here being that you put this into a directory, next to a file named "github.token" that contains a token to access the API, as per: https://github.com/settings/tokens )

And it even came up with a suggestion for initial heuristics to help prioritize pull requests/issues:

To prioritize pull requests and issues effectively, here are some heuristics based on different criteria:

Activity level: Pull requests and issues that have a high level of activity should be prioritized. High activity could include many comments, reactions, and changes. This could be an indicator of the importance of the issue or pull request.

Severity of the issue: Prioritize high-severity issues that have a significant impact on the project or its users. For example, a security vulnerability or a critical bug that affects functionality.

Number of distinct users commenting: Issues and pull requests that have more distinct users commenting should be given priority. More people commenting can mean that the issue or pull request is relevant to a larger audience or may require more attention.

Complexity of the patch: Pull requests that touch many files or introduce new ones should be given a higher priority, as they can potentially affect a larger part of the project.

Time sensitivity: Issues and pull requests that have time sensitivity, such as blocking the release of a new version or a critical issue in a production environment, should be given priority over others.

Importance to the project: Pull requests and issues that address critical features or long-standing issues should be given higher priority than those that address less critical features or minor bugs.

Behind the scenes, pull requests are also just issues, the API is covered here: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28

Essentially, we'd be getting a JSON dump of all issues matching certain criteria and could then sort those using a script according to more detailed criteria, i.e. in line with some heuristics - Auto-GPT would then be able to label issues accordingly.

And there you have it: Your very first step towards "self-improvement" (in the form of the project being able to manage itself by agreeig on a set of heuristics/goals) ...

gran4 commented 1 year ago

How would this work? Is it a separate system running?

(Side note) Also, it shouldn't have write permissions.

WellnessSupplyCo commented 1 year ago

/> > I did tinker with this today, and at least in principle it seems to be quite capable of creating new commands and plugins by adapting existing ones - for instance, adapting git_operations to add more operations would work "as is" with sufficient context.

turns out, with very little hand-holding, Auto-GPT is even capable of creating scripts based on reading the official github API docs to access github issues in a scripted fashion, as per: https://github.com/Significant-Gravitas/Auto-GPT/issues/1454#issuecomment-1509474337

In other words, if people are feeling this might be worthwhile to pursue, it's relatively straightforward to get Auto-GPT to parse the github API docs https://docs.github.com/en/rest/quickstart?apiVersion=2022-11-28&tool=curl and come up with curl snippets to access project specific issues, which could in turn be turned into dedicated commands or rather a "github" plugin.

The following is what it created for me locally in under 60 seconds inside the workspace:

`curl --request GET \

--url "https://api.github.com/repos/Significant-Gravitas/Auto-GPT/issues" \

--header "Accept: application/vnd.github+json" \

--header "Authorization: Bearer $(cat github.token)"

`

(The assumption here being that you put this into a directory, next to a file named "github.token" that contains a token to access the API, as per: https://github.com/settings/tokens )

And it even came up with a suggestion for initial heuristics to help prioritize pull requests/issues:

To prioritize pull requests and issues effectively, here are some heuristics based on different criteria:

Activity level: Pull requests and issues that have a high level of activity should be prioritized. High activity could include many comments, reactions, and changes. This could be an indicator of the importance of the issue or pull request.

Severity of the issue: Prioritize high-severity issues that have a significant impact on the project or its users. For example, a security vulnerability or a critical bug that affects functionality.

Number of distinct users commenting: Issues and pull requests that have more distinct users commenting should be given priority. More people commenting can mean that the issue or pull request is relevant to a larger audience or may require more attention.

Complexity of the patch: Pull requests that touch many files or introduce new ones should be given a higher priority, as they can potentially affect a larger part of the project.

Time sensitivity: Issues and pull requests that have time sensitivity, such as blocking the release of a new version or a critical issue in a production environment, should be given priority over others.

Importance to the project: Pull requests and issues that address critical features or long-standing issues should be given higher priority than those that address less critical features or minor bugs.

Behind the scenes, pull requests are also just issues, the API is covered here: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28

Essentially, we'd be getting a JSON dump of all issues matching certain criteria and could then sort those using a script according to more detailed criteria, i.e. in line with some heuristics - Auto-GPT would then be able to label issues accordingly.

And there you have it: Your very first step towards "self-improvement" (in the form of the project being able to manage itself by agreeig on a set of heuristics/goals) ...

This actually is brilliant and could even spoon feed auto-gpt api & instructions on how to fix it all through GitHub itself. Parse an issue, return the object, Inside comments or on the issue; commands can be made.

Heres an example output from a basic script that pulls all the data from githubs API and then prioritizes it. It's not very good tbh but it's still something in that direction.

https://pastebin.com/dgHKB3Dg

Boostrix commented 1 year ago

How would this work? Is it a separate system running? no, it's just using the existing method of an ai_settings.yaml file and an empty workspace with API docs and examples. From then on, it figures out on its own how to obtain issues and pull requests. Ideally, we'd be using different agents for each step - so that these steps don't confuse each other:

one agent to set up the script to fetch issues/pulls from github
the next agent to adapt this for a specific project (Auto-GPT in this case)
a new agent that turns heuristics into a script to process the JSON dump

to be fair, it did get stuck at the required token, I had to ask it how to obtain a github token - but the rest went well. Also, it does help to copy examples from the API docs as plain text files into the workspace and reference those examples via the ai_settings.yaml file.

So, obtaining the JSON dump of all issues/pulls is basically "solved". The next step would be a script to process this JSON dump and apply the heuristics. The heuristics themselves could be part of the github repo, so that collaborators could easily adapt those as needed. Locallly, each collaborator would use a dedicated token when running this to help prioritize/manage issues and pulls.

But under the hood, these agents would basically be collaborating because they would be using the same heuristics.

Heres an example output from a basic script that pulls all the data from githubs API and then prioritizes it. It's not very good tbh but it's still something in that direction

right, that's looking good - and it doesn't matter if it's not that good, the idea is to come up with something to "bootstrap" the missing portion/rest of it.

I also went through the process of using a standalone bash script initially - even though asking the system to come up with new commands/plugins would feel more "natural" (and more powerful!). For the time being, adding a new command is not that well isolated, because you basically need to touch 3 different places (files). I tried to let it manipulate a diff to automate this portion, but that failed because a diff needs the hash/crc of the modified files. Probably, teaching the system how to come up with new plugins and actually test those, would be simpler - as per: https://github.com/Significant-Gravitas/Auto-GPT/issues/15#issuecomment-1527801074

the general thinking would be: a pull request has more weight than issue, because it contains a potential solution
pull requests that don't modify code, but only docs or comments are prioritized
pull requests that improve compatibility with different platforms are prioritized to ensure that the community keeps growing
a pull request that has multiple authors has more weight, because it's got more potential manpower
a pull request that's been updated frequently, is more active / interesting because it's more responsive
a pull request that modifies few files/places and whose diff is small, is less work to review
a pull request without conflicts is more "interesting" to the system
a pull request that touches files/places that are touched by other pulls need to be labelled as such, because there are potential conflicts down the road - so these are not prioritized, just re-labelled to encourage the contributors to collaborate

Note that these "heuristics" are not meant to be set in stone, rather they're intended to be used to "prime" a script/ai_settings.yaml file (agent) to come up with heuristics to help organize pulls and issues accordingly.

If anybody would like to collaborate on this, I would be interested in teaming up to come up with a rough prototype to see where that's taking us. My plan of action would be to use the system itself to automate the major portion of it.

So, my current thinking would be:

use Auto-GPT itself to bootstrap missing features (initially scripts)
the next step would be to turn those scripts into dedicated commands or plugins for Auto-GPT
Specifically, my suggestion would be to come up with a "github" plugin (probably outside the source tree, in order not be constrained by the bottleneck of reviews/pull requests here)
once the github plugin for Auto-GPT works well enough to fetch open pulls/issues, we would be using dedicated sub-agents for remaining tasks

specifically:

we would be using a new yaml file to specify heuristics to be used to manage pulls/issues
this yaml file would contain plain text heuristics so that collaboratos can easily manage/update and refine those as needed
there would be a sub-agent to open this yaml file and to traverse all open issues/pulls to try to organize open pulls/issues
the initial focus will be on managing pulls and making suggestions to the project in the form of labels and comments

requirements:

contributors to this effort should have access to the OpenAI API
should not be bothered by "donating" some of their API credits to this project by letting the plugin play with open pulls/issues to help streamline relevant pulls/issues: https://github.com/Significant-Gravitas/Auto-GPT/issues/1454#issuecomment-1528869458
access to a working instance of Auto-GPT
access to a github token: https://github.com/settings/tokens
ideally have some experience working with git, github and Python (initially in a Linux environment)

Note that contributor will not need to be expert coders, the idea really is to use the system to self-improve itself by running proper agents to create missing functionality. This will be accomplished by setting up new workspace directories that contain API docs, examples, code snippets in addition with a corresponding ai_settings.yaml file.

My suggestion would be to come up with a mechanism to use "virtual workspaces" where an empty directory is checked for an ai_settings.yaml file which in turn creates a sub-folder for all outputs. That way, we could easily share our agents and even manage them via git. So, the point being to collaboratively edit a repository of "agents" and sub-agents that contain workspaces with examples, API docs, code snippets etc to create missing functionality. We would only be editing those agents (ai_settings.yaml) and provide a "references" directory that contains examples and documentation.

willguest commented 1 year ago

it does help to copy examples from the API docs as plain text files into the workspace and reference those examples via the ai_settings.yaml file

This made me think that an 'Exemplifier' agent would be handy. Good prompts often rely on accurate examples, so having an agent tasked with finding them (in official docs, etc.) could have a holistic benefit

Boostrix commented 1 year ago

That's a good point, I kinda handled that portion of the job by downloading API docs and extracting examples, and to simplify the whole task, I put things into different files with proper names and file extensions. I even introduced variables as placeholders for things that obviously needed replacing/changes, which made the whole thing rather rapid/successful (definitely moreso than me doing everything on my own)

Also, the quality of an "exemplifier agent" would be measured by the number of examples that it can come up which in turn can be used by sub-agents to solve actual tasks. Basically like test driven development (TDD) and unit testing done in reverse.

the quality of examples would then be defined by the number of dependencies required, the number of files/changes involved etc

and such an exemplifier could just as well look for related git repositories, check those out and look at the commit history (patches + log message) to come up with training data. Possibly using an intermediate LLM to convert queries into actionable diffs/patches

zachary-kaelan commented 1 year ago

We won't actually need API docs for commonly used libraries. I asked ChatGPT-4 straight-up how to manage GitHub using the API and it thoroughly explained it, with placeholders and an example and everything, also linking the page to generate an access token. One thing we could implement is a Rubber Duck agent which the main agent asks for help to get things out of its own training material.

We're starting to get into multi-agent systems here, and the Wikipedia page actually has Auto-GPT in the list of related articles.

Boostrix commented 1 year ago

Sure, GPT-4 is extremely powerful ...

But I suppose, it still isn't a bad idea to keep in mind the underlying workflow that works best - especially if/when GPT-4 availablity should become a factor. Auto-GPT being tied to GPT4/OpenAI currently makes a ton of sense - but over the course of the next 6 months, a ton of GPT-like LLMs are going to show up - and being able to "switch" LLMs (remote ones or even local ones) will be useful - and some of these may not have had the sort of horsepower / computing power behind it that ChatGPT4 had (still has).

One thing we could implement is a Rubber Duck agent which the main agent asks for help to get things out of its own training material.

That is an observation that I made locally: using sub-agents to distribute well-defined tasks across multiple agents worked better than letting a single agent handle everything on its own. Also, I used "rubber ducking" to implement a poor man's version of persistent state by splitting up tasks into unit tests (shell scripts) see if some steps could be stepped in between invocations of the same agent that had previously bailed out - in order to avoid having to re-create a previously created directory/file or re-download certain files: https://github.com/Significant-Gravitas/Auto-GPT/issues/3382#issuecomment-1529776848

The basic idea is to use a unit_test[key]=value hash table/dictionary where the MD5 hash of the prompt is the key into the hash to obtain the value, which would be the shell script (command/plugin) to execute to see if the current instruction can be skipped or not.

WellnessSupplyCo commented 1 year ago

How would this work? Is it a separate system running?

no, it's just using the existing method of an ai_settings.yaml file and an empty workspace with API docs and examples.

From then on, it figures out on its own how to obtain issues and pull requests.

Ideally, we'd be using different agents for each step - so that these steps don't confuse each other:

one agent to set up the script to fetch issues/pulls from github

the next agent to adapt this for a specific project (Auto-GPT in this case)

a new agent that turns heuristics into a script to process the JSON dump

to be fair, it did get stuck at the required token, I had to ask it how to obtain a github token - but the rest went well.

Also, it does help to copy examples from the API docs as plain text files into the workspace and reference those examples via the ai_settings.yaml file.

So, obtaining the JSON dump of all issues/pulls is basically "solved".

The next step would be a script to process this JSON dump and apply the heuristics.

The heuristics themselves could be part of the github repo, so that collaborators could easily adapt those as needed.

Locallly, each collaborator would use a dedicated token when running this to help prioritize/manage issues and pulls.

But under the hood, these agents would basically be collaborating because they would be using the same heuristics.

Heres an example output from a basic script that pulls all the data from githubs API and then prioritizes it. It's not very good tbh but it's still something in that direction

right, that's looking good - and it doesn't matter if it's not that good, the idea is to come up with something to "bootstrap" the missing portion/rest of it.

I also went through the process of using a standalone bash script initially - even though asking the system to come up with new commands/plugins would feel more "natural" (and more powerful!). For the time being, adding a new command is not that well isolated, because you basically need to touch 3 different places (files). I tried to let it manipulate a diff to automate this portion, but that failed because a diff needs the hash/crc of the modified files.

Probably, teaching the system how to come up with new plugins and actually test those, would be simpler - as per: https://github.com/Significant-Gravitas/Auto-GPT/issues/15#issuecomment-1527801074

the general thinking would be: a pull request has more weight than issue, because it contains a potential solution

pull requests that don't modify code, but only docs or comments are prioritized

pull requests that improve compatibility with different platforms are prioritized to ensure that the community keeps growing

a pull request that has multiple authors has more weight, because it's got more potential manpower

a pull request that's been updated frequently, is more active / interesting because it's more responsive

a pull request that modifies few files/places and whose diff is small, is less work to review

a pull request without conflicts is more "interesting" to the system

a pull request that touches files/places that are touched by other pulls need to be labelled as such, because there are potential conflicts down the road - so these are not prioritized, just re-labelled to encourage the contributors to collaborate

Note that these "heuristics" are not meant to be set in stone, rather they're intended to be used to "prime" a script/ai_settings.yaml file (agent) to come up with heuristics to help organize pulls and issues accordingly.

If anybody would like to collaborate on this, I would be interested in teaming up to come up with a rough prototype to see where that's taking us. My plan of action would be to use the system itself to automate the major portion of it.

So, my current thinking would be:

use Auto-GPT itself to bootstrap missing features (initially scripts)

the next step would be to turn those scripts into dedicated commands or plugins for Auto-GPT

Specifically, my suggestion would be to come up with a "github" plugin (probably outside the source tree, in order not be constrained by the bottleneck of reviews/pull requests here)

once the github plugin for Auto-GPT works well enough to fetch open pulls/issues, we would be using dedicated sub-agents for remaining tasks

specifically:

we would be using a new yaml file to specify heuristics to be used to manage pulls/issues

this yaml file would contain plain text heuristics so that collaboratos can easily manage/update and refine those as needed

there would be a sub-agent to open this yaml file and to traverse all open issues/pulls to try to organize open pulls/issues

the initial focus will be on managing pulls and making suggestions to the project in the form of labels and comments

requirements:

contributors to this effort should have access to the OpenAI API

should not be bothered by "donating" some of their API credits to this project by letting the plugin play with open pulls/issues to help streamline relevant pulls/issues: https://github.com/Significant-Gravitas/Auto-GPT/issues/1454#issuecomment-1528869458

access to a working instance of Auto-GPT

access to a github token: https://github.com/settings/tokens

ideally have some experience working with git, github and Python (initially in a Linux environment)

Note that contributor will not need to be expert coders, the idea really is to use the system to self-improve itself by running proper agents to create missing functionality. This will be accomplished by setting up new workspace directories that contain API docs, examples, code snippets in addition with a corresponding ai_settings.yaml file.

My suggestion would be to come up with a mechanism to use "virtual workspaces" where an empty directory is checked for an ai_settings.yaml file which in turn creates a sub-folder for all outputs. That way, we could easily share our agents and even manage them via git.

So, the point being to collaboratively edit a repository of "agents" and sub-agents that contain workspaces with examples, API docs, code snippets etc to create missing functionality. We would only be editing those agents (ai_settings.yaml) and provide a "references" directory that contains examples and documentation.

Ok; I'm actually down for this one. My whole point of being on git and this community is to learn more Python and Git/GitHub. I've been pondering this as well and for initial testing for plugin creation I believe it would be a lot easier to create a throwaway GitHub account & let our bots play with some accounts. I think initially we can have auto gpt work on assigned issues/pull requests and test it by feeding instructions. In my understanding a PR/issue are functionally the same? Obviously a PR would hold more weight but we want the bot to submit PR's right? Not approve? Or are we starting off with functionality to manage the codebase first?

I have also been playing around with some prompts to get functional plug-ins spit out by gpt4; turns out you don't even really need to feed much info into gpt for it to work; the definition for the post_prompt function and the definition of the class file seems to allow it to make a plugin that's about 90% complete as you were sayin(minus some basic error handling) so I don't really see a need for huge prompts in this context. I have found that telling to to give you a basic framework is even better as it lays out the classes and then in your next prompts you can iterate though the unfinished functions and complete them. What about this? Laying out a class and then using the evaluate_code function to improve?

Feeding an entire file to gpt and saying "give me extra functions for this class" also works extremely well, but I see the issue of it not being able to create enough functionality because it cannot realistically know any other portion of the code.

Now what about chunking the code base into pinecone? Does the token limit effect what information it can retrieve from the DB or does it just effect its context limit?

Boostrix commented 1 year ago

I suppose, for starters, it would be a passive/readonly thing to pull issues and then work with the resulting JSON file. Downloading the JSON file is trivial to do already, so meat of it will be in using a new agent to map heuristics to code (probably Python?) to parse the JSON and determine relationships between different issues/pulls.

Initially, that will probably mean looking at titles/descriptions and comments to look up relationships between pulls/issues and come up with a weighing mechanism.

Ideally, this "script" would then come up with low-hanging fruits, i.e. things that someone could tackle - the success of which could be measured by another agent being able to look at an issue and actually solve it.

It is possible that we will need to create fake issues/pulls to provide contextual information - for instance, to add a new command or plugin to Auto-GPT. Once that data is part of the JSON dump, the agent could use this "experience" to implement new commands/plugins accordingly.

In my understanding a PR/issue are functionally the same? Obviously a PR would hold more weight but we want the bot to submit PR's right? I suppose, that would be the end goal. The first goal would be to parse the JSON dump and learn the dependencies/relationships between different pulls and issues, possibly in relation to the underlying git repo. It being able to automatically identify similar/related issues could already be hugely useful, even without it writing any code. For instance, see: https://github.com/Significant-Gravitas/Auto-GPT/discussions/3630 (This is only about a handful of issues, each of which have at least 5+ dupes currently)

Realistically, managing the code base seems rather far fetched currently (though I have read related discussions here). For starters, the agent will need to be able to read/process pulls and issues. For instance to identify similar issues or pulls that are touching the same files.

So, I guess:

make it download the JSON dump
traverse all pulls/issues
identify related/similar issues
provide an option to label those as "likely dupe"

Apart from that, I don't think managing/modifying the code base itself is a low hanging fruit right now. Making it create custom command and plugins seems useful enough. And it is a much better isolated task, too.

I would think one of the first steps might be editing the plugin template and annotating it with a ton of comments/context. The next step might be using a templating engine to take this plugin code and insert placeholders for stubs to be filled in. That way, an agent would have a ton of surrounding context to accomplish its task in a well-defined manner. We could even make testing (unit testing) a part of the plugin framework, so that each plugin can be tested using a bunch of test cases - which in turn, the agent could use to see how much progress it's making.

Feeding an entire file to gpt and saying "give me extra functions for this class" also works extremely well, but I see the issue of it not being able to create enough functionality because it cannot realistically know any other portion of the code.

That is indeed a problem - and one that human coders also face when working with an unfamiliar code base - i.e. navigating the project and learning how everything hangs together. I believe I mentioned this elsewhere, I was thinking of using a "heatmap" - where a prompt/query would be mapped to a vector of paths/file names and line numbers (ranges) to determine the files/portions of code most likely to be of interest when implementing a new feature. This could be based on looking at the commit logs of the project and doing a reverse map: commits[commit_message] = patch

Thus, once an agent wants to add a new command/plugin, there will be a heatmap available that suggest which files/line numbers would be relevant because the words "command" and "plugin" were used in the corresponding commits, with several commits increasing the priority/weight of the corresponding files/line numbers.

Now what about chunking the code base into pinecone? Does the token limit effect what information it can retrieve from the DB or does it just effect its context limit?

have not yet played with this stuff, and I am also not familiar with it, so will let others comment on it. Personally, my thinking is that even just letting Auto-GPT support a "workflow" to look at its own commands/plugins to generate new ones using agents would be powerful enough for my needs, and it could serve as the foundation for more functionality down the road. Thus, I am not going to tinker with modifying the whole code base using agents until agents are capable of adding commands/plugins in a working fashion, and I can definitely live with such functionality being constrained by things like the context window size (even just having 30% of the context window for the actual code would suffice for my needs) - my idea is to use multiple nested agents and short/well-defined commands and plugins and message passing - analogous to Linux shell commands communicating over pipes.

Trying to get Agent-GPT to modify its own code base at this point is akin to SpaceX trying to fly to Mars next month in my opinion :-)

WellnessSupplyCo commented 1 year ago

Sure, GPT-4 is extremely powerful ...

But I suppose, it still isn't a bad idea to keep in mind the underlying workflow that works best - especially if/when GPT-4 availablity should become a factor. Auto-GPT being tied to GPT4/OpenAI currently makes a ton of sense - but over the course of the next 6 months, a ton of GPT-like LLMs are going to show up - and being able to "switch" LLMs (remote ones or even local ones) will be useful - and some of these may not have had the sort of horsepower / computing power behind it that ChatGPT4 had (still has).

One thing we could implement is a Rubber Duck agent which the main agent asks for help to get things out of its own training material.

That is an observation that I made locally: using sub-agents to distribute well-defined tasks across multiple agents worked better than letting a single agent handle everything on its own.

Also, I used "rubber ducking" to implement a poor man's version of persistent state by splitting up tasks into unit tests (shell scripts) see if some steps could be stepped in between invocations of the same agent that had previously bailed out - in order to avoid having to re-create a previously created directory/file or re-download certain files: https://github.com/Significant-Gravitas/Auto-GPT/issues/3382#issuecomment-1529776848

So why not some sort of dynamic api class that can load a json file with methods allowing it to create what ever request it can load on the fly? We can setup a public repo on GitHub where any(with approval) could upload a json file and the bot could clone/load it?

https://pastebin.com/gpjvt8qM

Boostrix commented 1 year ago

wow, I didn't understand the English portion of it, only the code - I also didn't know that was possible like this.

I was thinking along the lines of python (it being the language of choice here and for plugins). So, I guess at this point (brainstorming), it's more important to get something started/going than discussing the nitty gritty details - a working prototype has more value than a non-working concept/design document.

Thus, if we can put together some rough scripts to end up with a working prototype, that's preferable in my opinion - especially if we can live with the idea that this is at most going to be a plugin in the forseeable future.

The question remains if we can continue this discussion here, or we should rather open a new issue/topic elsewhere ?

zachary-kaelan commented 1 year ago

So why not some sort of dynamic api class that can load a json file with methods allowing it to create what ever request it can load on the fly? We can setup a public repo on GitHub where any(with approval) could upload a json file and the bot could clone/load it?

https://pastebin.com/gpjvt8qM

OpenAI has something like this in the works for ChatGPT plugins. Might be some good ideas in there.

The question remains if we can continue this discussion here, or we should rather open a new issue/topic elsewhere ?

Maybe an issue in the plugins repo?

sdfgsdfgd commented 1 year ago

people fall in love with their fantasies and all the hype.

Can you first make even a GPT-4-only AutoGPT session improve 2 go/py files of 600 lines of code by itself ? ??

Maybe you should work on helping us solve the memory issues before you fantasise about AGI, giving it personas and living the dream. Everyone is a genius when it comes to forking their own AutoGPT with left brain right brain personas, but get outside your bubble and help the main development right here about memory management - there are actual hard challenges to solve ASAP

kkurian commented 1 year ago

@sdfgsdfgd I agree with the sentiment but that's too harsh from my perspective.

Right now, no one knows exactly the bounds of what GPT-4 is capable of doing.

So, it's to all of our benefits for people who are feeling the itch to try something and see how it works and report back with details.

What's not particularly helpful, as you are pointing out and I agree with you, is when someone says they did, e.g., a left/right brain experiment without providing an actual example of it and then they say that it generated amazing results without showing an actual example of the supposedly amazing result.

Fantasy and hype are just that... until someone shows that they're not. I'd love to see someone post a PR to this issue showing that Auto-GPT recursive self improvement is not fantasy and hype but can actually be made to work.

zer0int commented 1 year ago

From the side of prompt engineering, I had a remarkable encounter yesterday, trying to implement a GradCAM for CLIP ViT - which, as opposed to CLIP GradCAM for ResNet, is a "very recent development" (see "What do Vision Transformers learn? A Visual Exploration": arXiv:2212.06727 or "Masked Unsupervised Self-training for Zero-shot Image Classification": arXiv:2206.02967).

Giving GPT-4 some code from late 2022 made it obvious: THIS is unknown / was not in GPT-4's training dataset: It hallucinated non-existing CLIP ViT layers, messed up the tensor shapes and then said "I am sorry we need to change that back and change something else", then messing up some other part to be "more broken", even implementing some Transformers ResNetConfig, seemingly mixing up "Huggingface Transformers" with "Transformer-based models" and hallucinating that this will work for a CLIP Vision Transformer.

So far, so clear.

I thus engaged in "knowledge hiding" with another instance of GPT-4, merely asking about OpenAI CLIP (that knowledge is present), receiving some very nice small python scripts that allowed me to determine the patch size for a given CLIP ViT model, and expected dimensions, etc. - I was hopeful I might just be able to "fix it myself" by just "interviewing the AI about CLIP ViT technological details". But eventually, the AI figured out I was hiding something from it, so GPT-4 eventually asked me to "hand over the code you are hiding from me already", albeit in its usual overly polite way.

And that's where my mind was blown. Drawing on previous experiments with "formal instruction to write an essay" -> classifiers detect it's AI generated -> cheering on the AI in very human ways using emojis and going like "wooohooo you can do it, you amazing AI!" and then asking for a re-write will lead to GPT-4 defeating any classifiers with "more human than human" text, I decided to just be open and honest.

Basically, I explained what happened with its previous instance, outcome code = rekt, and asked it to refrain from "guessing around" and tripping out (hallucinating) on the code.

I wasn't very hopeful this would work, as not-in-dataset stuff + GPT-4's inability to just shrug and go like "I dunno" usually leads to trouble.

I was dead wrong. GPT-4 took a humble debugger's approach. It didn't mess with the code. Based on the Traceback / Errors thrown, it asked me to just print the tensors at various parts (which it recommended / implemented) so it could look at it. Concluded that something isn't being carried out, referring to a certain "hook" for CLIP's multihead attention layer.

It refrained from guessing, asking me to hack the pytorch-grad-cam package and also print various tensors and "The function / hook X was called". It then, finally, drew careful, limited conclusions based on abundance of evidence it acquired, and finally - was able to fix the code and get it running! The not-in-dataset stuff!

Granted, it isn't working as expected for reasons unknown to me, but GPT-4 left me with a suggestion to 1. try different gradcam methods 2. try changing the color scheme 3. try different layers (albeit saying it was sure that the CLIP's final layer's multihead attention is where I want to be for CLIP GradCAM feature visualization).

This, to be quite frank, blew my mind. I felt like I was just the secretary of the AI, with my sole purpose being to "slow down the AI" due to my human copypaste speed. In other words, I would have approved 100% of actions the AI carefully carried out after being begged to "please not hallucinate and guess around".

https://twitter.com/zer0int1/status/1653094778400038912

This should give you a rough idea for how this behavior can be summarized and "prompt engineered" for AutoGPT. However, if you want the full script, I am happy to export it to a PDF. But beware: It's LOOOOONG. Hours of careful AI debugging and analyzing. Let me know if you want it, nevertheless.

Boostrix commented 1 year ago

Fantasy and hype are just that... until someone shows that they're not. I'd love to see someone post a PR to this issue showing that Auto-GPT recursive self improvement is not fantasy and hype but can actually be made to work.

obviously, it's just a fantasy for the time being. But so was hooking up GPT to your CLI ... fantasy is how this project came to be after all.

Like I said elsewhere, "self-improvement" in the actual sense is actually to be implemented any time soon. It's more likely to use Auto-GPT to help manage the project itself (by reviewing pulls and issues). And maybe by coming up with commands and plugins. Those are things that can be tackled even with current restrictions in place (context window etc).

For the time being, any patch/diff touching more than just a single file is unlikely to fit into the context window anyway, and summarization/chunking doesn't work too well once you need to retain the structure of the source code, but also need to pass in a bunch of constraints and other contextual info.

Anyway, early experiments making Auto-GPT write its own commands and plugins seemed promising.

So I suppose, over the course of the next 4-6 weeks that might actually become a feature of the project, because it's such a logical step given the plethora of open feature requests here.

In other words, you will almost certainly be able to tell it to write its own commands and plugins to do something that it previously wasn't able to do directly (feel free to ping me if that shouldn't be the case).

Probably, it won't succeed at writing any command/plugin, but it's going to be a start, and basically the foundation for self-improvement: a system that can at least partially self-improve, is already self-improving anyway - just not in the total/global sense where you'd hand it a copy of the git repo/history and a bunch of python tutorials to make to hack away.

Then again, the project is already struggling due to its sheer popularity/fame, just look at the number of open PRs - enabling contributors to easily extend the system, will probably cause even more PRs (at the very least to extend the APIs for commands/plugins)

stukennedy commented 1 year ago

people fall in love with their fantasies and all the hype.

Can you first make even a GPT-4-only AutoGPT session improve 2 go/py files of 600 lines of code by itself ? ??

Maybe you should work on helping us solve the memory issues before you fantasise about AGI, giving it personas and living the dream. Everyone is a genius when it comes to forking their own AutoGPT with left brain right brain personas, but get outside your bubble and help the main development right here about memory management - there are actual hard challenges to solve ASAP

Disagree, it's only when we think outside of the box and try different approaches that we solve these problems. To me it feels like AutoGPT is becoming over-engineered and seems to be less useful than it was near the beginning. Throwing features at the problem and abstracting them more doesn't help. I'm actually trying to solve the same problems just looking at them from a different perspective. I have some fixes I'm going to suggest, but I don't think it will help, because the main solution has moved away so far from a LLM approach that it struggles to think its way through. Here's some micro-fixes I would suggest:

I've noticed the web-crawler in AutoGPT fails a lot, instead of summarising chunks and then summarising the summaries when searching it should answer the question and ignore chunks where the answer isn't. (That fixes it for me) The JSON instruction approach isn't the best, it's a chat-bot and wants to talk. My approach allows it to talk and embed commands within the text. I stream the responses and look for the completion signal, then check for commands. It's way more reliable. I'm working on it using git to create and edit files instead of file writing directly. Again way more reliable. I'm also working on the memory problem by looking into dynamically training a GPT-3.5 model on the conversation instead of using a vector store. I think this is closer to what our brains do in terms of long term memory. Not got far with that yet. I don't feel that contributing to AutoGPT will really help much; it feels too much like a runaway train. My 2-brain approach isn't a fantasy it's how my solution is working. Im not trying to build Chappie, just a tool that can help me solve more complex research and coding problems.

Boostrix commented 1 year ago

I'm working on it using git to create and edit files instead of file writing directly. Again way more reliable.

That is something that I also tinkered with, and it's actually proved useful in and of itself (being able to look at different stages of output). It could also be a cheap way to "checkpoint" progress by using the git tree as a "dump space" for all sorts of state/info (files + log messages).

I don't feel that contributing to AutoGPT will really help much; it feels too much like a runaway train. given the popularity of the project and its sheer number of PRs/contributors, I kinda feel your pain. Even though I am not sure that I agree with your conclusion. If you think the "2-Brains" approach is that promising, providing a demo using the sub-agent mechanism shouldn't be that difficult ? Or are you using a different LLM backend ?

The point being, we're all tinkering with this to automate some aspects of your lives obviously - and what may work for some of us, might at least be useful to others - either as an experimental patch, or as an actual pull request to provide the corresponding feature as an option - so that we can all tinker with different ideas to see what works best for our own use cases.

Besides, "2 brains" may sound super fancy - but under the hood, it's really just piggybacking onto the idea of using two different agents to apply mutually conflicting constaints to the solution space - so it's not that we're talking rocket science here, it does sound to me like a sound idea - and I suppose all of us are quite familiar with the "dialog" we're listening in our heads when thinking. So, emulating this idea to see where it takes is, sounds like a good idea to me - regardless of other issues.

Not to play devil's advocate here, but maybe, just maybe: what we're lacking to solve the memory issue is indeed 2 brain hemispheres ? :-)

katmai commented 1 year ago

people fall in love with their fantasies and all the hype. Can you first make even a GPT-4-only AutoGPT session improve 2 go/py files of 600 lines of code by itself ? ?? Maybe you should work on helping us solve the memory issues before you fantasise about AGI, giving it personas and living the dream. Everyone is a genius when it comes to forking their own AutoGPT with left brain right brain personas, but get outside your bubble and help the main development right here about memory management - there are actual hard challenges to solve ASAP

i disagree with this too. i think the hard challenge is understanding where you want to go and what you want to do. and by you, i mean you.

"can you make it improve by itself?" - improve what? improvement is just a concept of yours. you're seeing the imperfections and you want it to improve. you're asking it to "improve" but does it even think it needs "improvement" or does it like the way it is?

the only thing i can think of that you could possibly think when you're talking about "self-improvement" is for it to go online, find some code best practices and ... try to logically implement that in it's code, or something along those lines, because surely there's no way you can be talking about self-improvement to someone that you don't even know or care if it even wants to improve or is happy as is, or is.

so about the ASAP part, i think you should slow down and figure out if you understand what you're asking. and if you're unsure maybe you need to slow down even more until you do. right now you're just bouncing in a hall of mirrors with concepts of your own making. or mine.

Boostrix commented 1 year ago

in its simplest form, self-improvement in this context could be as simple as taking the python file of a plugin and adding/improving a few comments/Python docstrings. That's not exactly rocket science, and the system do so already. The next step might be renaming variables or optimizing some loop/constructs.

Thus, once again, self-improvement in this case really isn't that far-fetched, it all boils down to what people have in mind. But the foundation, and the option, is already there. It just needs to be used, tested and extended.

Sure, telling a script to "self-improve" by checking out its own repository and working through a bunch of PRs is probably the ultimate wet dream of any developer - but heck folks, maybe you need a little more patience ? Given it a few more months guys and you will probably see it being used to review PRs and issues, you will probably see it being used to add commands and plugins and maybe sum up frequently reported issues to come up with some over-arching strategy to implement new features.

None of that is off the table, regardless of the context window restriction. Before you can walk: Baby steps ...

willguest commented 1 year ago

people fall in love with their fantasies and all the hype.

Can you first make even a GPT-4-only AutoGPT session improve 2 go/py files of 600 lines of code by itself ? ??

Maybe you should work on helping us solve the memory issues before you fantasise about AGI, giving it personas and living the dream. Everyone is a genius when it comes to forking their own AutoGPT with left brain right brain personas, but get outside your bubble and help the main development right here about memory management - there are actual hard challenges to solve ASAP

I mean, I guess you're contributing. You present an alternative perspective in a space which helps it to be less of a "bubble". Feels like a bit of a rant though. Is the goal to shut down this issue/conversation, or do you simply wish to deprioritize it in favour of... i'm sorry which issue are you referencing? I didn't see it mentioned because you were too busy being offended by someone else's conversation.

I feel the discussion here has more nuance that you give it credit for. I don't think there are many here who believe that the LLM will just "write its own code", rather they understand the iterative process of software improvement and want to involve the machine in that process in a meaningful way.

stukennedy commented 1 year ago

This should give you the gist of what I'm doing without too much detail. I call the chat from another thread that first calls the right brain with the messages history (initially just holds the user prompt). It stacks the messages onto the unique system message for each brain hemisphere. When the brain responds I check it for commands and process them ... the responses go back in the thread and the same brain is called again until it sends no more commands, it then passes to the other brain.

import sys
from termcolor import colored
import openai

def send_request(messages, color, model, temperature, presence_penalty, frequency_penalty):
    openai.api_key = open(".openai-api-key.txt").read().strip()
    total_content = ""
    for chunk in openai.ChatCompletion.create(
            model=model,
            messages=messages,
            temperature=temperature,
            presence_penalty=presence_penalty,
            frequency_penalty=frequency_penalty,
            stream=True,
    ):
        content = chunk["choices"][0].get("delta", {}).get("content")
        if content is not None:
            sys.stdout.write(colored(content, color))
            sys.stdout.flush()
            total_content += content
    print()
    print(colored("-------------------------", "white"))
    return total_content.strip()

command_action_prompt = f"""
    I have given you access to the internet by providing you with the following commands:
        $web_page_summary("url", "query")$ - search a web page for a summary and links based on your query
        $google_search("query")$ - search google for a list of links based on your query
    Google search results can be used to drive web page summary, and web page summary can be used to drive google search.
    I have given you file access by providing you with the following commands:
        $file_read("filename")$ - read the contents of a file
        $file_write("filename", "content")$ - write the content to a file
        $file_append("filename", "content")$ - append the content to a file
        $file_delete("filename")$ - delete a file
        $search_files("directory")$ - search for files in a directory
        $summarize_text("filename")$ - get a summary of some text
        $bash("command")$ - run a bash command
    You should use these commands to help the user solve their problem.
    I recommend using the bash commands along with file access to write new files and run scripts that extend your capabilities.
    If you would like to ask the user a question or for input or clarification you should use the following command:
        $user("question")$ - ask the user for action, input, or clarification
    you should have at most one command in any response.
    """

def right_brain(responses):
    messages = [{"role": "system", "content": f"""
        You are the right side of the brain that is full of ideas and inspiration.
        Your role is to help the user solve their problem by providing ideas and inspiration in collaboration with the 
        left brain.
        {command_action_prompt}
        Avoid duplicating the left brain's response. If you fully agree with the left brain, you should just say so rather than
        repeating what they said.
        Think outside the box and be creative.
        Optimising for creativity, not correctness.
        You will present your ideas to the left brain, who will decide if they are good enough to be implemented.
        You should alwats start each of your responses with "Right brain:"."""}] + responses
    model = "gpt-4"
    temperature = 0.8
    presence_penalty = 0
    frequency_penalty = 0
    return send_request(messages, "light_red", model, temperature, presence_penalty, frequency_penalty)

def left_brain(responses):
    messages = [{"role": "system", "content": f"""
        You are the left sight of the brain.
        Your role is to help the user solve their problem by criticising and correcting suggestions coming from the right brain.
        You should help the right brain to be more strucured and logical, and to think about the consequences of their ideas.
        You should suggest plans and strategies to help keep the ultimate goal in mind .. and not allow the right brain to get too distracted.
        You will decide if the ideas are good enough to be implemented. And suggest improvements if they are not.
        {command_action_prompt}
        Avoid duplicating the right brain's response. If you fully agree with the right brain, you should just say so rather than
        repeating what they said.
        When you are satisfied with the solution, or want to end the conversation you will say "[DONE]".
        You should always start each of your responses with "Left brain:"."""}] + responses
    model = "gpt-4"
    temperature = 0.2
    presence_penalty = 1
    frequency_penalty = 1
    return send_request(messages, "cyan", model, temperature, presence_penalty, frequency_penalty)

def summarize(responses):
    messages = [{"role": "system",
                 "content": "You are a content summarizer. Be concise and accurate."}] + responses
    model = "gpt-3.5-turbo"
    temperature = 0.7
    presence_penalty = 1
    frequency_penalty = 1
    return send_request(messages, "green", model, temperature, presence_penalty, frequency_penalty)

Often it forgets to write code in the command ... and writes it in a code block. So I'm thinking of changing it to make it fit better with its natural way of responding (i.e. encourage it to communicate code changes with diffs and label the file to be written to). I think this may really help with it writing code.

I think the key is to work WITH GPTs preferential approach to dialogue instead of forcing it down a JSON API-esque route. I think we lose resolution and nuance that way, because it loses the benefit of the way it has structured previous responses. I'm doing this independently to keep it small and simple, so that I can quickly try different approaches. But I think Auto-GPT is an amazing project and I'm definitely copying some of the neat tricks for integration (e.g. Google search and web page search) I also stream the output and write everything to a file ... because it's readable it is a helpful resource for picking up where it left off etc.

katmai commented 1 year ago

import sys
from termcolor import colored
import openai

def send_request(messages, color, model, temperature, presence_penalty, frequency_penalty):
    openai.api_key = open(".openai-api-key.txt").read().strip()
    total_content = ""
    for chunk in openai.ChatCompletion.create(
            model=model,
            messages=messages,
            temperature=temperature,
            presence_penalty=presence_penalty,
            frequency_penalty=frequency_penalty,
            stream=True,
    ):
        content = chunk["choices"][0].get("delta", {}).get("content")
        if content is not None:
            sys.stdout.write(colored(content, color))
            sys.stdout.flush()
            total_content += content
    print()
    print(colored("-------------------------", "white"))
    return total_content.strip()

command_action_prompt = f"""
  I have given you access to the internet by providing you with the following commands:
        $web_page_summary("url", "query")$ - search a web page for a summary and links based on your query
        $google_search("query")$ - search google for a list of links based on your query
    Google search results can be used to drive web page summary, and web page summary can be used to drive google search.
    I have given you file access by providing you with the following commands:
        $file_read("filename")$ - read the contents of a file
        $file_write("filename", "content")$ - write the content to a file
        $file_append("filename", "content")$ - append the content to a file
        $file_delete("filename")$ - delete a file
        $search_files("directory")$ - search for files in a directory
        $summarize_text("filename")$ - get a summary of some text
        $bash("command")$ - run a bash command
    You should use these commands to help the user solve their problem.
    I recommend using the bash commands along with file access to write new files and run scripts that extend your capabilities.
  If you would like to ask the user a question or for input or clarification you should use the following command:
        $user("question")$ - ask the user for action, input, or clarification
    you should have at most one command in any response.
    """

def right_brain(responses):
    messages = [{"role": "system", "content": f"""
        You are the right side of the brain that is full of ideas and inspiration.
        Your role is to help the user solve their problem by providing ideas and inspiration in collaboration with the 
        left brain.
        {command_action_prompt}
      Avoid duplicating the left brain's response. If you fully agree with the left brain, you should just say so rather than
      repeating what they said.
        Think outside the box and be creative.
        Optimising for creativity, not correctness.
        You will present your ideas to the left brain, who will decide if they are good enough to be implemented.
        You should alwats start each of your responses with "Right brain:"."""}] + responses
    model = "gpt-4"
    temperature = 0.8
    presence_penalty = 0
    frequency_penalty = 0
    return send_request(messages, "light_red", model, temperature, presence_penalty, frequency_penalty)

def left_brain(responses):
    messages = [{"role": "system", "content": f"""
        You are the left sight of the brain.
        Your role is to help the user solve their problem by criticising and correcting suggestions coming from the right brain.
        You should help the right brain to be more strucured and logical, and to think about the consequences of their ideas.
        You should suggest plans and strategies to help keep the ultimate goal in mind .. and not allow the right brain to get too distracted.
        You will decide if the ideas are good enough to be implemented. And suggest improvements if they are not.
        {command_action_prompt}
        Avoid duplicating the right brain's response. If you fully agree with the right brain, you should just say so rather than
      repeating what they said.
        When you are satisfied with the solution, or want to end the conversation you will say "[DONE]".
        You should always start each of your responses with "Left brain:"."""}] + responses
    model = "gpt-4"
    temperature = 0.2
    presence_penalty = 1
    frequency_penalty = 1
    return send_request(messages, "cyan", model, temperature, presence_penalty, frequency_penalty)

def summarize(responses):
    messages = [{"role": "system",
                 "content": "You are a content summarizer. Be concise and accurate."}] + responses
    model = "gpt-3.5-turbo"
    temperature = 0.7
    presence_penalty = 1
    frequency_penalty = 1
    return send_request(messages, "green", model, temperature, presence_penalty, frequency_penalty)

ah. so you are expressing yourself and your way of thinking through that code.

Boostrix commented 1 year ago

FWIW, copying the priming command for the left/right brain hemispheres into a new ai_settings.yaml file and use those to start sub-agents, does yield some interesting "conversations" and "results". Not saying it works perfectly, but it's definitely an interesting idea (inspiration by mother nature). I feel that this could be something supported out of the box by the system, by using the equivalent of an "agent mode" - having a brain agent that has two sub-agents for left/right hemisphere seems like an interesting approach to pursue.

Emasoft commented 1 year ago

Prerequisites of this are: https://github.com/Significant-Gravitas/Auto-GPT/issues/3445 https://github.com/Significant-Gravitas/Auto-GPT/issues/3686

Also all commands should be converted in plugins.

aishwd94 commented 1 year ago

I have the following two questions/suggestions:

Question 1: I am assuming that this recursive self improvement would be applicable to to other projects as well. Like for example, if one wants Auto-GPT to build a large website which is domain-specific and requires domain-specific research on the internet, then that should be possible. For this purpose, it should have the ability to commit and push the code on ANY repository that it is working on and not just https://github.com/Significant-Gravitas/Auto-GPT/ , so shouldn't this be a Generic Recursive Self Improvement and not specific to Auto-GPT only ? So one way could be to feed the branch TIP to Auto-GPT and also give it the ability to "ask" for historical commits ( using git log ) and visit those points as. and when needed, since, if a human is coding, he/she does not go through the entire commit history. So at the most fundamental level, don't we just need to integrate all the git commands available into Auto-GPT in git_operations.py ?

Question 2: Only applicable for Windows and other OS with GUI, in this case I am only considering Windows

Sometimes in my job, I just need to do a lot of dumb repetitive tasks like copying files from one location to other, followed by some other task say, executing a command, followed by running an executable on a network path that changes every time. These tasks keep on changing always, and they are really just for setting up the development environment before beginning the development itself and automating these could boost productivity since writing automations for these ever-changing and trivial tasks is a waste of time.

What could really help in this case is: If there were a way to send an image of the desktop to chatGPT and have it move the pointer so that this workflow is automated.

As of today, chatGPT is still not accepting image input although OpenAI has announced that this feature would be launched soon. But until then, if there were a command to click on an area on the screen OR click a Window/Form Button, that would give much more control to Auto-GPT.

So the three options in this case are: Read the contents of a Form/Window, feed its description to chatGPT/AI Agent (until the image input feature comes out) and let it process the current task or do whatever it wants to do, and then give it a way to click on a button via:

a. Absolute coordinates on the screen ( a simple plugin developed in .NET and available as a command that takes absolute x,y coordinates, chatGPT will have to come up with the coordinates to click on by itself) b. A .NET automation for example - https://learn.microsoft.com/en-us/dotnet/framework/ui-automation/find-a-ui-automation-element-based-on-a-property-condition?source=recommendations c. A UIPath RPA script

Note: This is very dangerous since Auto-GPT can basically control your computer then, and thus very strict safeguards will have to be developed first before even starting to develop and integrate this.

Please comment on the feasibility of these two questions/suggestions.

yhyu13 commented 1 year ago

https://github.com/Significant-Gravitas/Auto-GPT/issues/15#issuecomment-1531947009 @ftl-traveller

I believe we can use this open source Apache 2.0 license python RPA framework instead of UIPath https://github.com/robocorp/rpaframework

Boostrix commented 1 year ago

For this purpose, it should have the ability to commit and push the code on ANY repository that it is working on and not just https://github.com/Significant-Gravitas/Auto-GPT/ , so shouldn't this be a Generic Recursive Self Improvement and not specific to Auto-GPT only ?

you can already make it push to other repos, and even create new ones from scratch - it just won't go through git_operations.py, but invoke the CLI directly. A number of folks reported cloning different repos and working on those using Auto-GPT. Then again, I do understand where you thinking comes from - since git_operations.py contains indeed some hard-coded assumptions

If there were a way to send an image of the desktop to chatGPT and have it move the pointer so that this workflow is automated.

You will probably want to look up the term "multi modality" in the context of GPT

As of today, chatGPT is still not accepting image input although OpenAI has announced that this feature would be launched soon. But until then, if there were a command to click on an area on the screen OR click a Window/Form Button, that would give much more control to Auto-GPT.

Given your use case, you may want to search discussions/RFEs mentioning "SingularGPT", someone is basically working exactly on your idea/approach: https://github.com/Significant-Gravitas/Auto-GPT/issues/276#issuecomment-1499946489 https://github.com/abhiprojectz/SingularGPT

Prerequisites of this are [...]

I suppose, a layered approach would even work "as is" (right now) to create commands/plugins, as long as each layer validates itself (imagine a stack of layers):

one layer (think agent) to ensure that the modification is valid Python code
one layer (agent) to ensure that the modification is a valid command/plugin module
one layer (agent) to run all tests of the command/plugin
one layer (agent) to ensure that the modification contains the desired functionality

Boostrix commented 1 year ago

https://github.com/Significant-Gravitas/Auto-GPT/issues/15#issuecomment-1529858630

My whole point of being on git and this community is to learn more Python and Git/GitHub. I've been pondering this as well and for initial testing for plugin creation I believe it would be a lot easier to create a throwaway GitHub account & let our bots play with some accounts. I think initially we can have auto gpt work on assigned issues/pull requests and test it by feeding instructions. In my understanding a PR/issue are functionally the same? Obviously a PR would hold more weight but we want the bot to submit PR's right? Not approve? Or are we starting off with functionality to manage the codebase first?

So, first step towards "self improvement": Let the script help our devs to manage open PRs.I kept it running over night to complete a python script according to some of the previously mentioned heuristics by parsing this issue to bootstrap the whole thing.

The heuristics it is now [supposed to be] using are:

only list PRs that have no conflicts
only list PRs that are not touching files modified by other PRs
list only PRs that are touching few files/lines (code changes)

The result is this (again, I haven't checked a single thing/PR here, this is all based on code created by running Auto-GPT and guiding it every once in a while):

PR #3706: Update Dockerfile - add missing scripts and plugins directories. (1 modified files, complexity score: 186)
PR #3701: ADD: Bash block in the contributing markdown (1 modified files, complexity score: 186)
PR #3690: Update setup.md (1 modified files, complexity score: 186)
PR #3633: Fix setup link in docs (1 modified files, complexity score: 186)
PR #3625: Update PULL_REQUEST_TEMPLATE.md to include linting (1 modified files, complexity score: 186)
PR #3564: Fix "unsafe repository" warning after creating devcontainer, initializes repository with proper user. (1 modified files, complexity score: 186)
PR #3481: switching from unittest to pytest in test_json_parser (1 modified files, complexity score: 186)
PR #3261: added scrape_text.py integration test (1 modified files, complexity score: 186)
PR #3061: Added type hints to function and method signature (1 modified files, complexity score: 186)
PR #3651: Add FastChat API for local models. (2 modified files, complexity score: 372)
PR #3569: Interaction Base (5 modified files, complexity score: 930)

My suggestion would be to get more eyeballs involved at this point, to gather some feedback. Specifically, we need a few Python folks (my background being C++, Auto-GPT wrote the Python code here) I would also think that adding "write" support for the github API would be useful, so that overlapping PRs (touching related files should be automatically made aware of related PRs).

My idea would be to also compute a "potential conflict" score, that would be based on the number of files (and line numbers) touched by a PR that are also touched by other PRs.

Just to clarify, I am not the slightest bit interested in coming up with the heuristics here, I am merely interested in addressing #1454 (helping brainstorm ideas to manage PRs and issues). My thinking here being that this project is facing an interesting challenge when it comes to its popularity/fame and activity, given the sheer number of weekly PRs and issue reports.

I am hoping that some of us could team up to develop some useful tooling, in close collaboration with the devs to help automate/streamline some of these tasks a little more, and then take it from there. At this point, feedback is obviously crucial. However just being able to prioritize the review/integration of 10+ PRs would IMO be worthwhile in and of itself.

Speaking in general, if this is considered useful enough, it being written in Python now - it could obviously be turned into a plugin for Auto-GPT to have better integration, too. That way, more folks could get involved. For the time being, the goal would be to focus on PRs and issues next, and once those features are in place/working, we could actually try to let this tackle some issues in a semi-automated fashion. For starters, that may involve someone labelling issues/RFEs accordingly, so that Auto-GPT could try to tackle those (we could focus on improving docs/tooling for instance).

johnseth97 commented 1 year ago

I think that the Halting problem makes this idea of recursive self improvement functionally impossible.

A computer program will never be able to fully understand itself without getting stuck an an infinite loop.

That's a hard limit of any Turing machine, unless somehow this has ascended the instruction sets embedded into our silicon chips

Boostrix commented 1 year ago

I think that the Halting problem makes this idea of recursive self improvement functionally impossible.

I suppose the majority of the people interested in this are interested in experimenting with the idea of pursuing this, to see how far it gets us - totally accepting that even just 10% of the idea may be far off, but that even just having 0.1% of this implemented could be very useful given the nature of the approach.

Just being able to look at simple PRs/issues or even adapt/write SIMPLE commands and plugins isn't impossible, and it could go a long way to demonstrate to newcomers what the system is capable of.

Even right now, there are PRs related to this idea - just see #765, #2642

That being said, when it comes to Auto-GPT the current "halting problem" can be summed up as it invoking vim/nano/joe every once in a while - aka interactive tools, despite it having been told not to do so, and that problem has a number of potential solutions to it.

Previous Next