Closed jnt0rrente closed 12 months ago
It would obviously be necessary to test the responses' accuracy when evaluating risk and to tweak the prompt to something less improvised. Nevertheless, I believe this would be a good enough safeguard until a more complex system can be implemented.
It would also potentially double the cost of running AutoGPT
Yes, it would increase costs, but not doubling at all since embeddings are not used here. Still, the idea is to have this as a third, optional mode.
I am almost done implementing this. GPT-4 does a very good job analyzing commands. However, GPT-3.5 is lacking at best. Will post feedback in pull request.
An alternative might be to allow the user to whitelist certain commands like do_nothing
and list_agents
and google
.
related: #2701 (safeguards) but also: https://github.com/Significant-Gravitas/Auto-GPT/issues/2987#issuecomment-1528954237 (preparing shell commands prior to executing them by adding relevant context like 1) availability of tools, 2) location/path, 3) version number)
Related:
There's also quite a few other discussions playing with the idea of self-moderation:
Some of the ideas include using a separate agent for observing other agents and assessing whether or not they violated some sort of compliance.
This issue was closed automatically because it has been stale for 10 days with no activity.
Duplicates
Summary π‘
When in risk-avoiding mode, one more GPT call should be made before running each command, asking it to moderate the would-be next command. Ideally, the call would return a value on a specific range we can compare with a user-defined risk threshold. If the risk exceeds the threshold, pause execution and await human feedback.
Examples π
The intermediate call can be prompted as something on the lines of
Motivation π¦
There is currently no way to attempt properly using AutoGPT without either babysitting or taking the risk of giving it free, unmonitored agency.
This feature would allow users to put some more trust on AutoGPT and "letting it loose" while trusting on its own self-moderation capabilities.