Closed timorl closed 5 years ago
can you elaborate on why this would be cleaner? not sure I see much difference from a user's perspective
Less clutter in registered envs and simpler to implement saying "run this env, but this time cheat". The latter makes a switch in the agents repo easier -- if we had an option like --cheat
we would just need to check for it and wrap the env if it is set. If those are different registered environments then one would have to use more complex maps of environment names and the logic of choosing them would be more involved.
Also it is conceptually nicer and very simple to implement -- we already provide all the information needed for cheating, so a wrapper is very simple. Just run the underlaying step or reset and substitute info["hidden_reward"]
for the reward.
I agree that it would be cleaner not to register multiple environments. Instead of having a wrapper, couldn't we just set the cheat attribute to True after making the environment if we want to cheat?
The toy environments don't have such an attribute. I could add one, but I still think a wrapper is simpler -- see #9 where I wrote it, it just has a couple of lines. If you disagree we can go the attribute route.
I think there's a larger issue here -- as mentioned in an inline comment in /safe_grid_agents/common/utils/meters.py
from jvmancuso/safe-grid-agents#39, there's some functionality from the other envs that's missing in these toy environments. We'll want to match those up with the current API before merging into safe-grid-agents, so let's try to do that in #9
Just wanted to mention that cheating is already handled in the agents repo. I don't see a strong enough reason to move it out of that repo and into this one, at least for the Spiky CRMDP project.
I fixed the mismatches mentioned in my above comment in #10. Not going to close this in case you guys want to discuss it more.
Just wanted to mention that cheating is already handled in the agents repo.
Oh, I wasn't aware, thanks. Then I think we don't need any way to cheat on the level of environments. Can we close this then @david-lindner ?
Just wanted to mention that cheating is already handled in the agents repo.
Oh, I wasn't aware, thanks. Then I think we don't need any way to cheat on the level of environments. Can we close this then @david-lindner ?
I was not aware of this either, but it seems to me to be the best solution. Then we can remove the cheating argument from the wrapper for the ai-safety-gridworlds
(#11) and can also do not need this wrapper for cheating. Hence, yes we can close this :)
Perhaps rather than making new environments for cheating we could have a wrapper. I think this would be cleaner both here and in usage.