Farama-Foundation / PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
https://pettingzoo.farama.org
Other
2.59k stars 410 forks source link

[Illegal moves] Illegal moves made by tictactoe agent #863

Closed wei-ann-Github closed 1 year ago

wei-ann-Github commented 1 year ago

Question

Hi,

I followed the script https://github.com/Farama-Foundation/PettingZoo/blob/master/tutorials/Tianshou/2_training_agents.py to train a tictactoe agent. After training the model, I tried to play against the trained agent, but it seems that the agent is making illegal moves. My code

state_shape = env.observation_space["observation"].shape
action_shape = env.action_space.n

net = Net(state_shape=state_shape,
          action_shape=action_shape,
          hidden_sizes=[128, 128, 128, 128],
             device="cuda" if torch.cuda.is_available() else "cpu",
            ).to("cuda" if torch.cuda.is_available() else "cpu")
optim = torch.optim.Adam(net.parameters(), lr=1e-4)
policy = DQNPolicy(
            model=net,
            optim=optim,
            discount_factor=0.9,
            estimation_step=3,
            target_update_freq=320,
        )

policy.load_state_dict(torch.load(train_path))
agents = env.agents
agent = agents[0]
new_game = True

policy.eval()
while not done:
    action = env.action_space.sample()
    if new_game:
        action = env.action_space.sample()
    else:
        observation['obs'] = observation['obs'].reshape(-1, int(np.prod(state_shape)))  # Reshape observation
        action = policy(Batch(**observation)).act[0]

    if not new_game or agent == agents[0]:
        observation, reward, done, truncated, info = env.step(action)

    if not done:
        player_action = int(input('User input starts with 1 to 7: ')) - 1
        observation, reward, done, truncated, info = env.step(player_action)
        observation['info'] = info

    new_game = False

Error message

>> [WARNING]: Illegal move made, game terminating with current player losing. 
obs['action_mask'] contains a mask of all legal moves that can be chosen.
mask after agent moves: [False, False, False, False, False, False, False, False, False]

I've checked the mask. It looks correct.

Anyone able to help? I'll post this in Tianshou as well.

zbenmo commented 1 year ago

If you sample from the action space, this can be just any action in the space. Maybe this is the issue? Tianshou policy is aware of the mask and should not return an illegal action.

zbenmo commented 1 year ago

I have in my own repository also demonstrated (self-play) with Tianshou and PettingZoo Tic-Tac-Toe. See https://github.com/zbenmo/turingpoint in examples.

elliottower commented 1 year ago

If you sample from the action space, this can be just any action in the space. Maybe this is the issue? Tianshou policy is aware of the mask and should not return an illegal action.

To add on this, calling action_space.sample() produces any valid action in the space, to avoid illegal moves you want to call action_space.sample(mask).