Closed AlexPasqua closed 1 year ago
Hello,
check_env
is meant to be used for debugging. Once the check passes, you don't have to use it and can use the learning algorithm directly (which is aware of the valid actions).
and if it's passed to step it causes a crash.
why not disable that and sample from the valid actions when debugging? (so when using the env checker).
Would it be possible to take the action mask into account when executing step within check_env?
you can do that by overriding action_space.sample()
by something which is aware of the action mask (without modifying the env checker).
Something like self.action_space.sample = self.valid_action_sampler
where valid_action_sampler
is a class method.
Would it be possible to take the action mask into account when executing step within check_env?
you can do that by overriding
action_space.sample()
by something which is aware of the action mask (without modifying the env checker). Something likeself.action_space.sample = self.valid_action_sampler
wherevalid_action_sampler
is a class method.
It makes sense.
Maybe it's worth specifying in the docs of MaskablePPO
the fact that check_env
doesn't take the mask into account?
Maybe it's worth specifying in the docs of MaskablePPO the fact that check_env doesn't take the mask into account?
yes, I'm happy to receive a PR ;)
Maybe it's worth specifying in the docs of MaskablePPO the fact that check_env doesn't take the mask into account?
yes, I'm happy to receive a PR ;)
Alright, I'll do it later today!
check_env
callsreset
andstep
, but the latter is executed passing a random action obtained throughaction_space.sample()
.In my custom environment, if an action is masked (not available), it is not selectable, and if it's passed to
step
it causes a crash. This does not normally happen because of the mask, butcheck_env
doesn't take the mask into account and simply samples an action. The masking doesn't actually change the action space, so a non-available action might actually be sampled.Would it be possible to take the action mask into account when executing
step
withincheck_env
?System Info