[Question] Action masking for a DQN Agent

❓ Question

I use a DQN agent to optimise an order release. The orders have an earliest start date, which is only reached in the course of the simulation. Currently, orders selected by the agent, but which may not yet be released, are penalised by a penalty in the reward function. However, it would be nice if it were possible to change the action space so that the agent can only select jobs whose earliest start date has already been reached using a mask.

Hence my question as to whether it is possible to mask the action space for a DQN agent in training and/or application in this way?

I am very grateful for any help!

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

DLR-RM / stable-baselines3

[Question] Action masking for a DQN Agent #1876

❓ Question

Checklist