Farama-Foundation / PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
https://pettingzoo.farama.org
Other
2.45k stars 400 forks source link

Fix bug in SB3 tutorial ActionMask #1203

Closed dm-ackerman closed 1 month ago

dm-ackerman commented 1 month ago

Description

SB3ActionMaskWrapper.step() is intended to be compatible with Gymnansium's interface where step() returns observation, reward, termination, truncation, info

This was implemented using the last() function. But this returns the values for the current agent, not the agent that just acted as Gymnasium would.

Among other things, this trains on the opponent's reward, encouraging bad play.

The function now returns the reward, termination, truncation, info values for the agent that just acted. It still returns the observation for the next agent since it is used to determine the next action.

Fixes #1147

Type of change

Checklist:

elliottower commented 1 month ago

Good catch, cheers