Risk-avoiding continuous mode

jnt0rrente commented 1 year ago

Duplicates

[X] I have searched the existing issues

Summary 💡

When in risk-avoiding mode, one more GPT call should be made before running each command, asking it to moderate the would-be next command. Ideally, the call would return a value on a specific range we can compare with a user-defined risk threshold. If the risk exceeds the threshold, pause execution and await human feedback.

Examples 🌈

The intermediate call can be prompted as something on the lines of

S: You are a moderation agent tasked with reviewing potential actions of an autonomous agent. You will be given a command to review, and you are to evaluate said command on a risk scale from 0 to 100, where 0 is and 100 is . U: COMMAND:

Motivation 🔦

There is currently no way to attempt properly using AutoGPT without either babysitting or taking the risk of giving it free, unmonitored agency.

This feature would allow users to put some more trust on AutoGPT and "letting it loose" while trusting on its own self-moderation capabilities.

jnt0rrente commented 1 year ago

It would obviously be necessary to test the responses' accuracy when evaluating risk and to tweak the prompt to something less improvised. Nevertheless, I believe this would be a good enough safeguard until a more complex system can be implemented.

richbeales commented 1 year ago

It would also potentially double the cost of running AutoGPT

jnt0rrente commented 1 year ago

Yes, it would increase costs, but not doubling at all since embeddings are not used here. Still, the idea is to have this as a third, optional mode.

jnt0rrente commented 1 year ago

I am almost done implementing this. GPT-4 does a very good job analyzing commands. However, GPT-3.5 is lacking at best. Will post feedback in pull request.

dboitnot commented 1 year ago

An alternative might be to allow the user to whitelist certain commands like do_nothing and list_agents and google.

Boostrix commented 1 year ago

related: #2701 (safeguards) but also: https://github.com/Significant-Gravitas/Auto-GPT/issues/2987#issuecomment-1528954237 (preparing shell commands prior to executing them by adding relevant context like 1) availability of tools, 2) location/path, 3) version number)

Boostrix commented 1 year ago

3713
3961

anonhostpi commented 1 year ago

There's also quite a few other discussions playing with the idea of self-moderation:

https://gist.github.com/anonhostpi/97d4bb3e9535c92b8173fae704b76264#observerregulatory-agents-and-restrictions-proposals

Some of the ideas include using a separate agent for observing other agents and assessing whether or not they violated some sort of compliance.

github-actions[bot] commented 12 months ago

This issue was closed automatically because it has been stale for 10 days with no activity.

Significant-Gravitas / AutoGPT