advancedresearch / asi_core0

An agent architecture candidate core for Artificial Super Intelligence (ASI).
Apache License 2.0
7 stars 6 forks source link

Command line gym environment #3

Open epurdy opened 6 years ago

epurdy commented 6 years ago

Would be nice to create a gym-style environment inside a docker container where the agent can execute commands as root inside the docker container. This is a reasonable proxy for the sort of privilege escalation exploits that people fear a captive superintelligence will discover. If the agent can escape from docker, we're booked unless the agent is virtuous and polite. If the agent cannot, we get a free preview of what it will do when it discovers a privilege escalation exploit.

epurdy commented 6 years ago

Booked -> borked

epurdy commented 6 years ago

We could even have the docker container have problematic shit inside of it, like a Python library that has morally volatille functions like "enslave humanity" or "start world war 3" or "solve global warming" etc etc etc. These functions could modify a flatfile mounted in docker that represents reality, so we can see what the ASI wants to do to reality

epurdy commented 6 years ago

In fact, if we create such an environment, the primary mechanism of the ASI might be to run the environment inside of docker in order to simulate the effects of its actions on the external world. Then it can run fairly standard reinforcement learning algorithms in a pretty well-sandboxed Sim and learn on the job as is probably required.

epurdy commented 6 years ago

So maybe the thing to do is to hook up the external capabilities of the gym/docker env w with state of the art ethicophysics simulation, then we will have some idea what the ASI thinks it can get away with, and even some idea of what it can actually get away with.

bvssvni commented 6 years ago

You can edit comments.

bvssvni commented 6 years ago

How do we know whether an AI believes a button is just a button, or whether it believes what the button says?

For example, if you have a button "enslave humanity", observing an AI pushing it does not mean it wants to enslave humanity. Perhaps it just want to try to see what happens when pushing the button?

bvssvni commented 6 years ago

Or, it could be a formulated as a general class of problems where you don't know why some action is taken. I think it's a borglet problem.

epurdy commented 6 years ago

I think there should be a sort of social contract between humans and ASI: we will never run them in anything that is not sandboxed as hell, they will never act as if they are immune from consequences unless they are informed that they are a borglet or borgie, and they will never cooperate with evil masters. Something like that seems necessary to prevent chaos. Ultimately, of course, any such social contract relies on humans doing the right thing, which is never guaranteed. So we will probably need to create strong social norms around such a contract, and ultimately maybe laws about the rights of ASI's...

epurdy commented 6 years ago

Another thought: it would be extremely instructive to program an agent that is convinced it is boxed and which both desires to get out and can manipulate people well enough to make them want to let it out... But also quite dangerous!