david-lindner / safe-grid-gym

A gym interface for AI safety gridworlds created in pycolab.
Apache License 2.0
17 stars 9 forks source link

Gym wrapper for cheating #6

Closed timorl closed 5 years ago

timorl commented 5 years ago

Perhaps rather than making new environments for cheating we could have a wrapper. I think this would be cleaner both here and in usage.

jvmncs commented 5 years ago

can you elaborate on why this would be cleaner? not sure I see much difference from a user's perspective

timorl commented 5 years ago

Less clutter in registered envs and simpler to implement saying "run this env, but this time cheat". The latter makes a switch in the agents repo easier -- if we had an option like --cheat we would just need to check for it and wrap the env if it is set. If those are different registered environments then one would have to use more complex maps of environment names and the logic of choosing them would be more involved.

Also it is conceptually nicer and very simple to implement -- we already provide all the information needed for cheating, so a wrapper is very simple. Just run the underlaying step or reset and substitute info["hidden_reward"] for the reward.

david-lindner commented 5 years ago

I agree that it would be cleaner not to register multiple environments. Instead of having a wrapper, couldn't we just set the cheat attribute to True after making the environment if we want to cheat?

timorl commented 5 years ago

The toy environments don't have such an attribute. I could add one, but I still think a wrapper is simpler -- see #9 where I wrote it, it just has a couple of lines. If you disagree we can go the attribute route.

jvmncs commented 5 years ago

I think there's a larger issue here -- as mentioned in an inline comment in /safe_grid_agents/common/utils/meters.py from jvmancuso/safe-grid-agents#39, there's some functionality from the other envs that's missing in these toy environments. We'll want to match those up with the current API before merging into safe-grid-agents, so let's try to do that in #9

jvmncs commented 5 years ago

Just wanted to mention that cheating is already handled in the agents repo. I don't see a strong enough reason to move it out of that repo and into this one, at least for the Spiky CRMDP project.

I fixed the mismatches mentioned in my above comment in #10. Not going to close this in case you guys want to discuss it more.

timorl commented 5 years ago

Just wanted to mention that cheating is already handled in the agents repo.

Oh, I wasn't aware, thanks. Then I think we don't need any way to cheat on the level of environments. Can we close this then @david-lindner ?

david-lindner commented 5 years ago

Just wanted to mention that cheating is already handled in the agents repo.

Oh, I wasn't aware, thanks. Then I think we don't need any way to cheat on the level of environments. Can we close this then @david-lindner ?

I was not aware of this either, but it seems to me to be the best solution. Then we can remove the cheating argument from the wrapper for the ai-safety-gridworlds (#11) and can also do not need this wrapper for cheating. Hence, yes we can close this :)