Legal and Ethical Safeguards for Prompts

Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

https://agpt.co

Other

167.71k stars 44.28k forks source link

Legal and Ethical Safeguards for Prompts #2701

Closed w0lph closed 1 year ago

w0lph commented 1 year ago

Duplicates

[X] I have searched the existing issues

Summary 💡

Currently, the agents are entirely unbounded by ethical and legal considerations. I have provided some examples that are a step toward adding default safeguards against malicious behavior. This is a complex and evolving issue, but something is better than nothing.

Examples 🌈

Heuristic Imperatives from David Shapiro: A simple constraint: "Reduce suffering in the universe, increase prosperity in the universe, and increase understanding in the universe."

Constitutional AI: Harmlessness from AI Feedback: A series of self-critique instructions.

Motivation 🔦

These constraints must be added to the prompt.py file so that agents don't end up misbehaving and causing illegal or unethical consequences.

Cytranics commented 1 year ago

They are if you use the azure api

w0lph commented 1 year ago

They are if you use the azure api

Relying solely on the underlying levels of the stack to act as a safeguard is not the safest path, as people might use different models that are unconstrained or such models might be compromised.

It's prudent to add a safeguard at the agent level to prevent unintended behavior if the model becomes insufficient. This will help create an additional baseline of safety for Auto-GPT that can help develop more capabilities on top of it more safely.

rabyj commented 1 year ago

As discussed in https://github.com/Significant-Gravitas/Auto-GPT/discussions/211, this is a very important concern. Putting a safer default is very valuable globally. This would also help avoid potential future legal problems with the userbase.

@w0lph is there a PR associated with this issue? It would help a lot with getting it processed.

hdkiller commented 1 year ago

I will create a plugin with the first prompt.

w0lph commented 1 year ago

I'm changing the title to Safeguards as I don't want this to be seen as censorship, just safe defaults. It also encompasses other improvements on this front.

ntindle commented 1 year ago

We discussed this internally a bit. One concern is people could remove the safeguarding code very easily. Thoughts on how to help with that?

hdkiller commented 1 year ago

That’s their decision if they remove safeguards, as they are responsible for the actions of their bot.

Cytranics commented 1 year ago

Encrypt safety prompts using a hash to the average joe wont know. If someone is smart enough to decrypt then they can build their own agent anyways.

From: Nicholas Tindle @.> Sent: Saturday, April 22, 2023 4:01 AM To: Significant-Gravitas/Auto-GPT @.> Cc: Cory Coddington @.>; Comment @.> Subject: Re: [Significant-Gravitas/Auto-GPT] URGENT: Legal and Ethical Safeguards (Issue #2701)

We discussed this internally a bit. One concern is people could remove the safeguarding code very easily. Thoughts on how to help with that?

— Reply to this email directly, view it on GitHubhttps://github.com/Significant-Gravitas/Auto-GPT/issues/2701#issuecomment-1518553553, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5RL5MDRNIQ373GTOOBVI53XCOF4JANCNFSM6AAAAAAXFZPT3Q. You are receiving this because you commented.Message ID: @.**@.>>

ntindle commented 1 year ago

They could just remove that section of code though

IsleOf commented 1 year ago

Stop calling people taking interest in agi's average Joes, it involves risk to humanity, refer to comic books. :)

Boostrix commented 1 year ago

one starting point would be passing shell commands to this safeguard prior to executing such commands, this could then look for "jailbreaks" or other questionable stuff, like trying to get root access etc

suparious commented 1 year ago

Makes more sense to have this as a .env feature that is enabled by default.

People will always find a way to circumvent censorship, even when it is been spoon-fed to them as "safety".

w0lph commented 1 year ago

People will always find a way to circumvent censorship, even when it is been spoon-fed to them as "safety."

In a saner and more intelligent society, people would stop conflating those two concepts and start promoting responsible AI development with the caution and nuance it deserves.

Boostrix commented 1 year ago