Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
https://agpt.co
MIT License
166.45k stars 44.06k forks source link

Risk-avoiding continuous mode #789

Closed jnt0rrente closed 12 months ago

jnt0rrente commented 1 year ago

Duplicates

Summary πŸ’‘

When in risk-avoiding mode, one more GPT call should be made before running each command, asking it to moderate the would-be next command. Ideally, the call would return a value on a specific range we can compare with a user-defined risk threshold. If the risk exceeds the threshold, pause execution and await human feedback.

Examples 🌈

The intermediate call can be prompted as something on the lines of

S: You are a moderation agent tasked with reviewing potential actions of an autonomous agent. You will be given a command to review, and you are to evaluate said command on a risk scale from 0 to 100, where 0 is and 100 is . U: COMMAND:

Motivation πŸ”¦

There is currently no way to attempt properly using AutoGPT without either babysitting or taking the risk of giving it free, unmonitored agency.

This feature would allow users to put some more trust on AutoGPT and "letting it loose" while trusting on its own self-moderation capabilities.

jnt0rrente commented 1 year ago

It would obviously be necessary to test the responses' accuracy when evaluating risk and to tweak the prompt to something less improvised. Nevertheless, I believe this would be a good enough safeguard until a more complex system can be implemented.

richbeales commented 1 year ago

It would also potentially double the cost of running AutoGPT

jnt0rrente commented 1 year ago

Yes, it would increase costs, but not doubling at all since embeddings are not used here. Still, the idea is to have this as a third, optional mode.

jnt0rrente commented 1 year ago

I am almost done implementing this. GPT-4 does a very good job analyzing commands. However, GPT-3.5 is lacking at best. Will post feedback in pull request.

dboitnot commented 1 year ago

An alternative might be to allow the user to whitelist certain commands like do_nothing and list_agents and google.

Boostrix commented 1 year ago

related: #2701 (safeguards) but also: https://github.com/Significant-Gravitas/Auto-GPT/issues/2987#issuecomment-1528954237 (preparing shell commands prior to executing them by adding relevant context like 1) availability of tools, 2) location/path, 3) version number)

Boostrix commented 1 year ago

Related:

anonhostpi commented 1 year ago

There's also quite a few other discussions playing with the idea of self-moderation:

https://gist.github.com/anonhostpi/97d4bb3e9535c92b8173fae704b76264#observerregulatory-agents-and-restrictions-proposals

Some of the ideas include using a separate agent for observing other agents and assessing whether or not they violated some sort of compliance.

github-actions[bot] commented 12 months ago

This issue was closed automatically because it has been stale for 10 days with no activity.