Farama-Foundation / PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
https://pettingzoo.farama.org
Other
2.57k stars 408 forks source link

[Bug Report] Cannot copy game state in Tic Tac Toe #1186

Closed bsgreenb closed 1 month ago

bsgreenb commented 7 months ago

Describe the bug

Creating a Tic Tac Toe player requires simulating hypothetical board states. To do this, my idea is to copy the state of the board at the root node, creating a new environment where I could simulate a hypothetical move.

I don't see any built in functions for this, so I attempted to see if copy.deepcopy() would copy the board state for me (see code below).

This gives an error: AttributeError: 'raw_env' object has no attribute '_cumulative_rewards'

Clearly, deepcopy is not fully copying the board state. So how am I supposed to consider hypothetical board positions starting from a given state? Technically I could replay every action in the history to get the new board there, but that seems wildly inefficient, especially since I plan to play more complex games in the gym environment after this.

Code example

import copy
from pettingzoo.classic import tictactoe_v3

env = tictactoe_v3.env(render_mode=None)
env.reset(seed=1)

env = copy.deepcopy(env)
env.step(0)

System info

PettingZoo version 1.24.3

Additional context

No response

Checklist

dm-ackerman commented 6 months ago

I believe the behaviour you're seeing is caused by the EzPickle class in gymnasium. Your call to deepcopy triggers the state functions in that class. Unfortunately that seems to create a new environment with the same arguments as the original, but not the same state. Obviously that's not what you want.

If you need to create hypothetical moves for what you're doing, you could do the step through method you mention. Alternatively you could add functions to set/load the state. I'm not sure if there is a better option.

You might try looking at gymnasium issues to see if there is a better method or posting on the discord. some relevant issues from gymnasium: https://github.com/Farama-Foundation/Gymnasium/issues/737 https://github.com/Farama-Foundation/Gymnasium/issues/94

elliottower commented 6 months ago

There are some environments in pettingzoo with get state (mentioned in that second issue) but as mark says in the first one it’s a bit late to add a feature like this and would be complicated.

I did something like you describe of considering all possible board states in a more manual way with a custom env I made, I just copied the env.board object and then did calculations based on that. Could probably just do the deep copy and then modify the env.board or any other important objects like agent selection rewards etc. It’s a highly unusual thing to do though as most training methods do not enumerate all possibilities so I think it shouldn’t be in the api, just done manually if need be