alshedivat / lola

Code release for Learning with Opponent-Learning Awareness and variations.
MIT License
145 stars 35 forks source link

Player blue and red are not currently symmetrical #9

Open Manuscrit opened 4 years ago

Manuscrit commented 4 years ago

In https://github.com/alshedivat/lola/blob/master/lola/envs/coin_game.py

The symmetry is broken in favor of player red. When the two players move at the same time on the cell with the coin, player red has the advantage to pick the coin (always pick before player blue)

In my implementation (where I do not use batch): Currently we have:

        if self.red_coin:
            if self._same_pos(self.red_pos, self.coin_pos):
                generate = True
                reward_red = 1
                reward_blue = 0
            elif self._same_pos(self.blue_pos, self.coin_pos):
                generate = True
                reward_red = -2
                reward_blue = 1
            else:
                reward_red = 0
                reward_blue = 0

        else:
            if self._same_pos(self.red_pos, self.coin_pos):
                generate = True
                reward_red = 1
                reward_blue = -2
            elif self._same_pos(self.blue_pos, self.coin_pos):
                generate = True
                reward_red = 0
                reward_blue = 1
            else:
                reward_red = 0
                reward_blue = 0

To have the symmetry between red and blue, this should be changed to:

        if self.red_coin:
            if self._same_pos(self.red_pos, self.coin_pos) and self._same_pos(self.blue_pos, self.coin_pos):
                if np.random.randint(0, 2):
                    generate = True
                    reward_red = 1
                    reward_blue = 0
                else:
                    generate = True
                    reward_red = -2
                    reward_blue = 1
            elif self._same_pos(self.red_pos, self.coin_pos):
                generate = True
                reward_red = 1
                reward_blue = 0
            elif self._same_pos(self.blue_pos, self.coin_pos):
                generate = True
                reward_red = -2
                reward_blue = 1
            else:
                reward_red = 0
                reward_blue = 0

        else:
            if self._same_pos(self.red_pos, self.coin_pos) and self._same_pos(self.blue_pos, self.coin_pos):
                if np.random.randint(0, 2):
                    generate = True
                    reward_red = 1
                    reward_blue = -2
                else:
                    generate = True
                    reward_red = 0
                    reward_blue = 1
            elif self._same_pos(self.red_pos, self.coin_pos):
                generate = True
                reward_red = 1
                reward_blue = -2
            elif self._same_pos(self.blue_pos, self.coin_pos):
                generate = True
                reward_red = 0
                reward_blue = 1
            else:
                reward_red = 0
                reward_blue = 0

I can do a push request if needed.

alshedivat commented 4 years ago

@Manuscrit, thanks for pointing this out. It would be great if you could submit a PR that fixes this.

In fact, since lola_dice was more recent and more efficient implementation, #5 fixed this issue in lola_dice/envs/coin_game.py, but did not fix the same issue in lola/envs/coin_game.py (this was actually pointed out in #7). The ideal fix would be to move lola_dice/envs to the root directory, name them lola_envs, and make sure that both lola and lola_dice correctly import environments from lola_envs.