DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.85k stars 1.68k forks source link

[Question] Action masking for a DQN Agent #1876

Closed Tim1605 closed 6 months ago

Tim1605 commented 6 months ago

❓ Question

I use a DQN agent to optimise an order release. The orders have an earliest start date, which is only reached in the course of the simulation. Currently, orders selected by the agent, but which may not yet be released, are penalised by a penalty in the reward function. However, it would be nice if it were possible to change the action space so that the agent can only select jobs whose earliest start date has already been reached using a mask.

Hence my question as to whether it is possible to mask the action space for a DQN agent in training and/or application in this way?

I am very grateful for any help!

Checklist

araffin commented 6 months ago

duplicate of https://github.com/DLR-RM/stable-baselines3/issues/1352#issuecomment-1450512547