Farama-Foundation / PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
https://pettingzoo.farama.org
Other
2.64k stars 418 forks source link

Chess Environment Trailing History #644

Closed 01jongmin closed 2 years ago

01jongmin commented 2 years ago

I was trying to read through and understand the chess environment before attempting to create my own environment for MARL problem. I was reading through the step function and think that the environment is archiving a trailing history.

 def step(self, action):
        if self.dones[self.agent_selection]:
            return self._was_done_step(action)
        current_agent = self.agent_selection
        current_index = self.agents.index(current_agent)
        next_board = chess_utils.get_observation(self.board, current_agent)
        self.board_history = np.dstack((next_board[:, :, 7:], self.board_history[:, :, :-13]))
        ....

When the step function is first called by the white pieces, the starting board is added to the board_history as it is run prior to any move being made. When the observe function is then run by the black pieces, the observation would fail to include the first move made by white pieces. Is there anything I am missing?

benblack769 commented 2 years ago

To clarify the intention here:

The observation for the first step by the white peices should not include any information. When you play the opening move in chess, all you see is the starting position.

The first step for the black peices should include the state after the move made by the white player.

More information here: https://www.pettingzoo.ml/classic/chess

01jongmin commented 2 years ago

I understand the intention of the library, my only confusion is why the observation following a call to the step function fails to show the correct board state in the observation ndarray

When I run the following code snippet

from pettingzoo.classic import chess_v5
env = chess_v5.env()
env.reset()
env.step(77)

The env.render() function correctly shows the board to be in a state after which the white player moves the left-most pawn up by one spot.

Calling env.observe("player_1")["observation"][:, :, 7], however, returns the positions of the pawns in the starting position. According to the intention, shouldn't channel 7 contain the board state with one of the pawns being moved up by one? If this wasn't the case, the black player ("agent_1") would have to make moves without knowing which move the white player made previously. I also checked for env.observe("player_1")["observation"][:, :, 13] which also shows the identical starting board.

Lastly, only after another call to env.step(77) is made, do I observe that channels 7-18 (env.observe("player_1")["observation"][:,:, 7:19]) in the observation array contain the board state identical to the one after the white player made the starting move.

benblack769 commented 2 years ago

Thanks for clarifying, this does sound like a problem. I'll look into it.

01jongmin commented 2 years ago

Thank you for taking the time to confirm my suspicion. I can try to create a branch and open up a PR if you don't mind!

benblack769 commented 2 years ago

@01jongmin So I looked into this, and it appears that the problem is that the observation space changed fairly dramatically in this PR: https://github.com/Farama-Foundation/PettingZoo/pull/430, and the documentation was never updated.

benblack769 commented 2 years ago

We can try to update the documentation soon.

01jongmin commented 2 years ago

I looked into the PR and believe that the way that the PR implemented the 8 step history itself might be trailing. The clearest example is how the __init__(self) function that sets self.board_history = np.zeros((8, 8, 104), dtype=bool) doesn't contain the initial board state, and the reset(self) function doesn't alter this. Then in the observe function, the library stack the first seven layers (containing non-board state data like castling rights, etc.) of the board state, with board_history which fails to include the initial board state. From my understanding, this effectively causes the first call to observe by player_0 to not include the starting board state.

jjshoots commented 2 years ago

Hi @benblack769 @01jongmin, what's the current status on this issue and is there anything that should be done to fix it?

benblack769 commented 2 years ago

Yeah, there is definitely something off about this logic. The problem seems to be very similar to what @jackyoung96 was dealing with in supersuit with frame stack, so perhaps he might be interested in taking a look. Or, I might find some time to look into this.

jackyoung96 commented 2 years ago

Oh, I miss this message. I will fix it ASAP

jjshoots commented 2 years ago

Hi @jackyoung96, it's been awhile since this has been visited, what's the progress on this so far?

jackyoung96 commented 2 years ago

I start to fix it in PR #714. Please let me know if this fix is ​​not what you intended.