Closed wei-ann-Github closed 1 year ago
If you sample from the action space, this can be just any action in the space. Maybe this is the issue? Tianshou policy is aware of the mask and should not return an illegal action.
I have in my own repository also demonstrated (self-play) with Tianshou and PettingZoo Tic-Tac-Toe. See https://github.com/zbenmo/turingpoint in examples.
If you sample from the action space, this can be just any action in the space. Maybe this is the issue? Tianshou policy is aware of the mask and should not return an illegal action.
To add on this, calling action_space.sample()
produces any valid action in the space, to avoid illegal moves you want to call action_space.sample(mask)
.
Question
Hi,
I followed the script https://github.com/Farama-Foundation/PettingZoo/blob/master/tutorials/Tianshou/2_training_agents.py to train a tictactoe agent. After training the model, I tried to play against the trained agent, but it seems that the agent is making illegal moves. My code
Error message
I've checked the mask. It looks correct.
Anyone able to help? I'll post this in Tianshou as well.