Open EloyAnguiano opened 3 months ago
It would be interesting to provide that method with the last observation object, isn't it? I am thinking about a game that has some logic that we want to keep and code to prevent the agent making those actions.
You can add that logic in the environment code, no? (that action mask may depend on previous observation or any other variable that represent the current env)
Yes, and I am doing that indeed, but the problem with this order of things is that you have to calculate the action mask for time t
usint the observation at t-1
, and changing this order could be useful to code come logic at mask t
with the observation at t
(even at the t=0
case)
this order could be useful to code come logic at mask t with the observation at t (even at the t=0 case)
This is what is currently done, no? (the action mask depends on obs at t)
Yes, you are right. It was a mistake at my environment code.
❓ Question
At
MaskablePPO
class, the change for getting the masks is to ask the environment to provide it by he functionget_action_mask
. I can see that theget_action_mask
only gets the environment object as input, but at that point we also have theself._last_obs
variable. To provide the action mask more information about the observation it is facing, It would be interesting to provide that method with the last observation object, isn't it? I am thinking about a game that has some logic that we want to keep and code to prevent the agent making those actions.I assume that I am not the first thinking this so, is it a performance killer to do like so? Has it something to do with the environment vectorizations?
Checklist