Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
https://sb3-contrib.readthedocs.io
MIT License
465 stars 173 forks source link

[Question] Apply Masking using ActionMasker on composite actions #249

Closed mwalidcharrwi closed 3 months ago

mwalidcharrwi commented 3 months ago

❓ Question

Hello,

I am trying to apply ActionMasker on an action space of type spaces.MultiDiscrete and MaskablePPO model using the below action space definition: self.action_space = spaces.MultiDiscrete([len(list1), len(list2)])

However I am getting a size mismatch when encoding the above action space to the spaces.MultiDiscrete

I tried looking through the issues list of the repository but what I found includes flattening the action and I was wondering is there any class method to handle MultiDiscrete without the need to do it manually?

Checklist

mwalidcharrwi commented 3 months ago

Dear @araffin

To provide more information about the issue I am getting the below is what I encounter when the above action_space is defined:

File "/Users/opt/anaconda3/envs/code/lib/python3.8/site-packages/sb3_contrib/common/maskable/distributions.py", line 240, in apply_masking
    masks = masks.view(-1, sum(self.action_dims))
RuntimeError: shape '[-1, 49]' is invalid for input of size 37

it is worth mentioning that the length of list1 is 37 and list2 is 14.

I saw some suggestions mentioning flattening the actions. However, I wanted to inquire if the MaskablePPO can handle it without the need to resize/modify the action dimension at all?

araffin commented 3 months ago

I tried looking through the issues list of the repository but what I found includes flattening the action and I was wondering is there any class method to handle MultiDiscrete without the need to do it manually?

I think there is a misunderstanding, if you look at the example env (in the repo) or at the issues, you need to flatten the action mask, not the actions.

mwalidcharrwi commented 3 months ago

So what would be an adequate mapping of a MultiDiscrete action_space into action masks (given two different list of different sizes).

Also the action mask would be a 1D np array? Will the size of that array be mapped to a specific action length of any of the action_space components (one of these lists)?

araffin commented 3 months ago

closing as duplicate of https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/80#issuecomment-2186409952