However, DeepQLearning.jl is always picking the full action space.
That's because the solve enumerates the actions once here, hands them into the policy, which are broadly used there after.
Do you think of a way to have action masking with the current implementation ?
POMDPs.jl supports state-dependent action spaces
However, DeepQLearning.jl is always picking the full action space. That's because the
solve
enumerates the actions once here, hands them into the policy, which are broadly used there after.Do you think of a way to have action masking with the current implementation ?