Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
https://sb3-contrib.readthedocs.io
MIT License
465 stars 173 forks source link

[Feature request] `check_env` could cause crashes with MaskablePPO #145

Closed AlexPasqua closed 1 year ago

AlexPasqua commented 1 year ago

check_env calls reset and step, but the latter is executed passing a random action obtained through action_space.sample().

In my custom environment, if an action is masked (not available), it is not selectable, and if it's passed to step it causes a crash. This does not normally happen because of the mask, but check_env doesn't take the mask into account and simply samples an action. The masking doesn't actually change the action space, so a non-available action might actually be sampled.

Would it be possible to take the action mask into account when executing step within check_env?

System Info

 python -c 'import stable_baselines3 as sb3; sb3.get_system_info()'

- OS: Linux-3.10.0-1062.4.3.el7.x86_64-x86_64-with-glibc2.27 # 1 SMP Wed Nov 13 23:58:53 UTC 2019
- Python: 3.8.0
- Stable-Baselines3: 1.8.0a3
- PyTorch: 1.12.1+cu102
- GPU Enabled: True
- Numpy: 1.23.4
- Gym: 0.21.0
araffin commented 1 year ago

Hello, check_env is meant to be used for debugging. Once the check passes, you don't have to use it and can use the learning algorithm directly (which is aware of the valid actions).

and if it's passed to step it causes a crash.

why not disable that and sample from the valid actions when debugging? (so when using the env checker).

Would it be possible to take the action mask into account when executing step within check_env?

you can do that by overriding action_space.sample() by something which is aware of the action mask (without modifying the env checker). Something like self.action_space.sample = self.valid_action_sampler where valid_action_sampler is a class method.

AlexPasqua commented 1 year ago

Would it be possible to take the action mask into account when executing step within check_env?

you can do that by overriding action_space.sample() by something which is aware of the action mask (without modifying the env checker). Something like self.action_space.sample = self.valid_action_sampler where valid_action_sampler is a class method.

It makes sense.

Maybe it's worth specifying in the docs of MaskablePPO the fact that check_env doesn't take the mask into account?

araffin commented 1 year ago

Maybe it's worth specifying in the docs of MaskablePPO the fact that check_env doesn't take the mask into account?

yes, I'm happy to receive a PR ;)

AlexPasqua commented 1 year ago

Maybe it's worth specifying in the docs of MaskablePPO the fact that check_env doesn't take the mask into account?

yes, I'm happy to receive a PR ;)

Alright, I'll do it later today!