Open seanyang0813 opened 8 months ago
This is so cool!
From everything I've read about reinforcement learning is that it's super important to figure out the right reward mechanism as incremental steps instead of just at the very end. Maybe having some reward for winning a sub-board would make it better.
Congratz, you won $100! Let me know how you want to receive the money by reaching out to me on twitter (@vjeux) or email: vjeuxx@gmail.com
Description
Let me preface this with this is a really weak bot The win, draw, loss rate are 86%, 8%, and 5% vs random bot and I can even easily beat it as a human The multi agent training environment is modified from https://pettingzoo.farama.org/environments/classic/tictactoe/ Training code is modified from https://github.com/thu-ml/tianshou/blob/master/test/pettingzoo/tic_tac_toe.py I implemented the rules of ultimate tic tac toe and UI myself. It uses reinforment learning algorithm to take in board state and output the action based on the masks. The network is trained for over 50 million steps if I recall correctly, one game can have max of 81 steps. So it played a lot of games but it's still super weak. I think it's because I trained it vs random bot so it's only learning how to beat random bot. But even that I was surprised that it didn't get to at least 99% vs bot. This is potentially because reward is collected at end of the whole game and there are many actions taken in between so it's quite hard to adjust properly. Vanilla DQN also converges quite slowly, I could potentially try PPO or rainbow DQN. Also implement a better environment for self play instead of constantly playing vs random.
Video
Screencast from 03-10-2024 11:13:04 PM.webm
This is me clicking randomly vs the bot. If I play seriously I can easily beat it. Allowed move for player will show light blue, player won grid will show green, bot won tile will show red, and tied will show yellow
Source code
https://github.com/seanyang0813/ultimate-tic-tac-toe-rl
How to try it
Pull the repo above Follow the readme requirements txt with
Running the game UI
For bench mark run
For training