facebookresearch / hanabi_SAD

Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning
Other
96 stars 35 forks source link

Where are the other-play equivalence mappings phi actions and observations? #34

Closed tessavdheiden closed 1 year ago

tessavdheiden commented 1 year ago

Hi Hengyuan,

Could you please indicate where the observation and action mappings are implemented? These from the paper:

Screenshot 2023-04-04 at 09 07 02

And where they are sampled and applied in an episode?

Many thanks in advance!

Best, Tessa

hengyuan-hu commented 1 year ago

Hi,

It is implemented in https://github.com/facebookresearch/hanabi_SAD/blob/main/cpp/hanabi_env.cc

Check places with shuffleColor, colorPermutes, invColorPermutes_.

Note that there is a better repo “off-belief-learining”. A similar functionality is implemented in this file of that repo. https://github.com/facebookresearch/off-belief-learning/blob/main/rlcc/r2d2_actor.cc

Hope it helps!

On Tue, Apr 4, 2023 at 12:08 AM Tessa van der Heiden < @.***> wrote:

Hi Hengyuan,

Could you please indicate where the observation and action mappings are implemented? These from the paper: [image: Screenshot 2023-04-04 at 09 07 02] https://user-images.githubusercontent.com/24938569/229714685-ca8300d7-d401-43d0-bd01-76dcec72479f.png

And where they are sampled and applied in an episode?

Many thanks in advance!

Best, Tessa

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/hanabi_SAD/issues/34, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABECKZNKI3GJCCPTTEELALTW7PCIHANCNFSM6AAAAAAWSJA724 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tessavdheiden commented 1 year ago

Hey!

Thanks for the links, I was looking in the Python scripts.

So here you create vectors that permute the actions/moves:

Screenshot 2023-04-05 at 09 45 36

And here is where it is used in the step() function:

Screenshot 2023-04-05 at 09 49 13

Where do you permute/map the observations? Is it true that you first permute the observations, let the player act, and then permute the action back?

Best, Tessa

hengyuan-hu commented 1 year ago

The observation is permuted inside this function encoder_.Encode https://github.com/facebookresearch/hanabi_SAD/blob/415804b531447bb4b8adb12100f994d588589cd8/cpp/hanabi_env.cc#L145

Is it true that you first permute the observations, let the player act, and then permute the action back?

Yes you are correct!

tessavdheiden commented 1 year ago

Hey, Thanks for the help!