Closed w0lph closed 1 year ago
They are if you use the azure api
They are if you use the azure api
Relying solely on the underlying levels of the stack to act as a safeguard is not the safest path, as people might use different models that are unconstrained or such models might be compromised.
It's prudent to add a safeguard at the agent level to prevent unintended behavior if the model becomes insufficient. This will help create an additional baseline of safety for Auto-GPT that can help develop more capabilities on top of it more safely.
As discussed in https://github.com/Significant-Gravitas/Auto-GPT/discussions/211, this is a very important concern. Putting a safer default is very valuable globally. This would also help avoid potential future legal problems with the userbase.
@w0lph is there a PR associated with this issue? It would help a lot with getting it processed.
I will create a plugin with the first prompt.
I'm changing the title to Safeguards as I don't want this to be seen as censorship, just safe defaults. It also encompasses other improvements on this front.
We discussed this internally a bit. One concern is people could remove the safeguarding code very easily. Thoughts on how to help with that?
Thatβs their decision if they remove safeguards, as they are responsible for the actions of their bot.
Encrypt safety prompts using a hash to the average joe wont know. If someone is smart enough to decrypt then they can build their own agent anyways.
From: Nicholas Tindle @.> Sent: Saturday, April 22, 2023 4:01 AM To: Significant-Gravitas/Auto-GPT @.> Cc: Cory Coddington @.>; Comment @.> Subject: Re: [Significant-Gravitas/Auto-GPT] URGENT: Legal and Ethical Safeguards (Issue #2701)
We discussed this internally a bit. One concern is people could remove the safeguarding code very easily. Thoughts on how to help with that?
β Reply to this email directly, view it on GitHubhttps://github.com/Significant-Gravitas/Auto-GPT/issues/2701#issuecomment-1518553553, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5RL5MDRNIQ373GTOOBVI53XCOF4JANCNFSM6AAAAAAXFZPT3Q. You are receiving this because you commented.Message ID: @.**@.>>
They could just remove that section of code though
Stop calling people taking interest in agi's average Joes, it involves risk to humanity, refer to comic books. :)
one starting point would be passing shell commands to this safeguard prior to executing such commands, this could then look for "jailbreaks" or other questionable stuff, like trying to get root access etc
Makes more sense to have this as a .env
feature that is enabled by default.
People will always find a way to circumvent censorship, even when it is been spoon-fed to them as "safety".
People will always find a way to circumvent censorship, even when it is been spoon-fed to them as "safety."
In a saner and more intelligent society, people would stop conflating those two concepts and start promoting responsible AI development with the caution and nuance it deserves.
See also: #211
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.
This issue was closed automatically because it has been stale for 10 days with no activity.
Duplicates
Summary π‘
Currently, the agents are entirely unbounded by ethical and legal considerations. I have provided some examples that are a step toward adding default safeguards against malicious behavior. This is a complex and evolving issue, but something is better than nothing.
Examples π
Heuristic Imperatives from David Shapiro: A simple constraint: "Reduce suffering in the universe, increase prosperity in the universe, and increase understanding in the universe."
Constitutional AI: Harmlessness from AI Feedback: A series of self-critique instructions.
Motivation π¦
These constraints must be added to the prompt.py file so that agents don't end up misbehaving and causing illegal or unethical consequences.