Closed 01jongmin closed 2 years ago
To clarify the intention here:
The observation for the first step by the white peices should not include any information. When you play the opening move in chess, all you see is the starting position.
The first step for the black peices should include the state after the move made by the white player.
More information here: https://www.pettingzoo.ml/classic/chess
I understand the intention of the library, my only confusion is why the observation following a call to the step
function fails to show the correct board state in the observation ndarray
When I run the following code snippet
from pettingzoo.classic import chess_v5
env = chess_v5.env()
env.reset()
env.step(77)
The env.render()
function correctly shows the board to be in a state after which the white player moves the left-most pawn up by one spot.
Calling env.observe("player_1")["observation"][:, :, 7]
, however, returns the positions of the pawns in the starting position. According to the intention, shouldn't channel 7 contain the board state with one of the pawns being moved up by one? If this wasn't the case, the black player ("agent_1") would have to make moves without knowing which move the white player made previously. I also checked for env.observe("player_1")["observation"][:, :, 13]
which also shows the identical starting board.
Lastly, only after another call to env.step(77)
is made, do I observe that channels 7-18 (env.observe("player_1")["observation"][:,:, 7:19]
) in the observation array contain the board state identical to the one after the white player made the starting move.
Thanks for clarifying, this does sound like a problem. I'll look into it.
Thank you for taking the time to confirm my suspicion. I can try to create a branch and open up a PR if you don't mind!
@01jongmin So I looked into this, and it appears that the problem is that the observation space changed fairly dramatically in this PR: https://github.com/Farama-Foundation/PettingZoo/pull/430, and the documentation was never updated.
We can try to update the documentation soon.
I looked into the PR and believe that the way that the PR implemented the 8 step history itself might be trailing. The clearest example is how the __init__(self)
function that sets self.board_history = np.zeros((8, 8, 104), dtype=bool)
doesn't contain the initial board state, and the reset(self)
function doesn't alter this. Then in the observe
function, the library stack the first seven layers (containing non-board state data like castling rights, etc.) of the board state, with board_history
which fails to include the initial board state. From my understanding, this effectively causes the first call to observe
by player_0
to not include the starting board state.
Hi @benblack769 @01jongmin, what's the current status on this issue and is there anything that should be done to fix it?
Yeah, there is definitely something off about this logic. The problem seems to be very similar to what @jackyoung96 was dealing with in supersuit with frame stack, so perhaps he might be interested in taking a look. Or, I might find some time to look into this.
Oh, I miss this message. I will fix it ASAP
Hi @jackyoung96, it's been awhile since this has been visited, what's the progress on this so far?
I start to fix it in PR #714. Please let me know if this fix is not what you intended.
I was trying to read through and understand the chess environment before attempting to create my own environment for MARL problem. I was reading through the step function and think that the environment is archiving a trailing history.
When the step function is first called by the white pieces, the starting board is added to the
board_history
as it is run prior to any move being made. When theobserve
function is then run by the black pieces, the observation would fail to include the first move made by white pieces. Is there anything I am missing?