Farama-Foundation / SuperSuit

A collection of wrappers for Gymnasium and PettingZoo environments (being merged into gymnasium.wrappers and pettingzoo.wrappers
Other
441 stars 56 forks source link

Added a compatibility to an obs with multiple informations in aec env #221

Closed jimherefornonsense closed 1 year ago

jimherefornonsense commented 1 year ago

Extended support to AEC's observations of spaces.Dict.

Before, it only accepts an RGB array as the returned observation from the environment. Now, it can take such an observation as obs["observation": value, "action_mask": value].

elliottower commented 1 year ago

Hi @jimherefornonsense sorry for the delay, I will take a look at this tonight hopefully. Feel free to reach out on discord (link to the Farama discord here https://discord.gg/nhvKkYa6qX) or directly reach out/tag me if you'd like

elliottower commented 1 year ago

Looks like pre-commit failed so if you could run pre-commit run --all-files that should fix it (pip install pre-commit and pre-commit install first if you haven't already)

jimherefornonsense commented 1 year ago

@elliottower Thanks for letting me know, I'm new to being a contributor. So should I open another PR once I pass all tests of pre-commit?

elliottower commented 1 year ago

@elliottower Thanks for letting me know, I'm new to being a contributor. So should I open another PR once I pass all tests of pre-commit?

No worries, I’ll just rerun the tests. And it’s always great to see new faces :)

elliottower commented 1 year ago

Another thing is it would be good to test with a pettingzoo classic env with an action mask, if you could write a test for that it would be great. Or I can help out with that part if need be. My other thought is that if the observation space is being changed, the action mask should be fine, but if the action space is being changed then the observation action mask should be change as well.

Granted, it’s not common to transform the action space but there is a lambda function for it so it might be worth considering. I think what would be good to do for now is have it throw a warning when you are changing the action space if there’s an action mask in the observation space. But I can add that myself if you prefer.

jimherefornonsense commented 1 year ago

@elliottower Yes, please do so if you don't mind. I think I can learn the coding standard from it :)

elliottower commented 1 year ago

Looked into it more and I think I'm going to do it a different way, because with your changes it is now impossible to use lambda_observation to change the action_mask, which is something I have actually done in the past (https://github.com/elliottower/ray/blob/4787c1b553b1ad64cbd4a45dccb4f3b3362042be/rllib/examples/self_play_with_pettingzoo.py#L186C10-L186C10, could have used the flatten_v0 wrapper though).

I think it's too heavy handed to always modify the observation to get rid of the extra information or only allow them to perform lambda functions on the observation. Technically speaking, the entire dictionary is the observation, and a good example usage of the observation_lambda wrapper would actually be to remove the action_mask, and change the obs space to be a box. In order to allow users to do that and other custom behavior, we need to avoid these sorts of restrictions. I did add an example to the tests which uses the wrapper in this way, and you can also use what I said above as reference.

I added tests for all of the wrappers with AEC envs and they pretty much all work without modification actually, besides obs_delay which required a check for the obs to be a dict instead of np.ndarray. Going to branch from your PR so you get credit because I think starting this code and pointing out that we should account for action masked AEC envs is a valuable contribution even if your code ends up not being used directly.

elliottower commented 1 year ago

Closing in favor of https://github.com/Farama-Foundation/SuperSuit/pull/227 (just because I don't have write permission to your fork of the repo, so I can't push my commits to this PR, simpler to just make a new one branched from this one)