DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.76k stars 1.67k forks source link

[Question] Why does DQN in StableBaseline3 not support MultiDiscrete action spaces #745

Closed PBerit closed 2 years ago

PBerit commented 2 years ago

Hi all,

I would like to use Deep-Q-Learning and I have a continuous state space and 3 actions (thus a 3 dimensional action space). What I fould confusing is that the StableBaseline version of DQN can't handle multiple actions spaces according to the offical website https://stable-baselines3.readthedocs.io/en/master/guide/algos.html#.

As far as I understand the MultiDiscrete action space are just multiple actions that have different number of discrete actions. Normally, an Artifical Neural Network, that is used in Q-Learning for the mapping, should be able to map any number of inputs to any number of outputs. So why can the StableBaseline version of Deep-Q-Network only have 1 action variable?

Miffyli commented 2 years ago

Regular DQN only works with a Discrete action space where it can choose one action from many, because it predicts the Q-value for all possible actions. A straight-forward (but crude) approach to do this for MultiDiscrete space is to predict Q-value for all possible combinations of the multiple Discrete choices, but this quickly blows up in size.

For a more sophisticated approach (and more details on the matter), see this paper on branching DQN.

PBerit commented 2 years ago

@Miffyli : Thanks Miffyli for your answer. What do you mean by "Regular DQN only works with a Discrete action space where it can choose one action from many, because it predicts the Q-value for all possible actions"? Normally, a Neural Network can map any inputs to any outputs and Deep-Q-Learning uses a Neural Network. So why can the Stable Baseline Deep-Q-Learning only have 1 action?

Miffyli commented 2 years ago

Normally, a Neural Network can map any inputs to any outputs and Deep-Q-Learning uses a Neural Network. So why can the Stable Baseline Deep-Q-Learning only have 1 action?

I do not have time to explain the whole situation, and I recommend you read more on "Q-learning" (not DQN) and the original DQN paper. The gist is this:

You may close this issue if you do not have enhancements/issues to report that are specific for stable-baselines :)

PBerit commented 2 years ago

Thanks Miffyli for your answer and help.