Closed cswinter closed 2 years ago
Thanks for looking into it. Oh this is interesting. I know the reason: in gym-microrts I determine if an actor is available to see if the actor is "busy" (not executing any actions at the moment), but it is possible to have situations where the actor is not busy but also no actions are available. For example, the barrack may not be busy but still has not enough money to produce any units.
I am ok with Option 1 but a bit lean towards Option 2 with a warning. Don't have a strong opinion on this though.
Actually maybe for the simplicity option 1 sounds more desirable :)
Actually maybe for the simplicity option 1 sounds more desirable :)
Implementation wise, it wouldn't be that complicated, just needs one loop in the __post_init__
method of Observation
When an action mask prevents all actions for an actor, we currently get a fairly inscrutable error message:
The issue is the row with all
nan
which caused by a row of logits which all set to-inf
by the mask.It's slightly unclear what the best way of handling this is. When all actions are masked out, no action is valid, so just returning a random action could break the environment. Here's the main options I can think of:
False
. Advantage is that things "just work", disadvantage is performance impact (though, this is probably fine since we don't expectEnvironment
to be maximally efficient), hiding potential logic error in environment implementation, and that the number of actions given to theact
method could be surprising and violate some assumption since it won't match the number ofactor_ids
specified by the environment.@vwxyzjn what are your thoughts on what the ideal API would do?