Closed p-i- closed 1 year ago
Like I mentioned elsewhere, I firmly believe the project should consider adding a new directory to the repository where all such ai-settings.yaml files could be committed/maintained (to have history etc). The point being, for any sort of AI, you inevitably need some form of benchmark and training data.
Thus, if the ~120k of folks using this project currently could share some of their dysfunctional ai-settings.yaml files, this benchmark suite could grow rapidly over the course of a couple of weeks, and then it would be a piece of cake to run this suite every once in a while to see which improvements to AutoGPT help the agent complete more tasks / in a more correct fashion.
Thus, please consider having some benchmark suite for yaml files - ideally, one category of files that are known to work well (for regression testing purposes) and another one for yaml files that "break" Auto-GPT (endless loops, repeating stuff etc)
Once this is in place, this could help improve the project rather rapidly.
For future reference:
https://github.com/Significant-Gravitas/Auto-GPT/issues/15#issuecomment-1498370501
Basically this whole github issue revolves around tests: We'll need to run benchmarks in github action to validate it's not "loosing" capability at every pull request. the benchmark has to use the same version of GPT every time and has to test the whole spectrum of what autogpt can do:
- write text
- browse the internet
- execute commands
- etc, etc...
https://github.com/Significant-Gravitas/Auto-GPT/issues/15#issuecomment-1498370501
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.
This issue was closed automatically because it has been stale for 10 days with no activity.
Duplicates
Summary 💡
In order to improve AutoGPT, we need examples of simple tasks which it should be able to ace but DOESN'T.
If you've found a good candidate task, please dump the contents ai_settings.yaml JSON in a comment.
Examples 🌈
This gets lost doing google searches.
Motivation 🔦
If we have decent benchmarks we can figure out where the technology is weakest, and apply focus at the key points.