seanyang0813 commented 8 months ago

Description

Let me preface this with this is a really weak bot The win, draw, loss rate are 86%, 8%, and 5% vs random bot and I can even easily beat it as a human Screenshot from 2024-03-10 22-44-49 The multi agent training environment is modified from https://pettingzoo.farama.org/environments/classic/tictactoe/ Training code is modified from https://github.com/thu-ml/tianshou/blob/master/test/pettingzoo/tic_tac_toe.py I implemented the rules of ultimate tic tac toe and UI myself. It uses reinforment learning algorithm to take in board state and output the action based on the masks. The network is trained for over 50 million steps if I recall correctly, one game can have max of 81 steps. So it played a lot of games but it's still super weak. I think it's because I trained it vs random bot so it's only learning how to beat random bot. But even that I was surprised that it didn't get to at least 99% vs bot. This is potentially because reward is collected at end of the whole game and there are many actions taken in between so it's quite hard to adjust properly. Vanilla DQN also converges quite slowly, I could potentially try PPO or rainbow DQN. Also implement a better environment for self play instead of constantly playing vs random.

Video

Screencast from 03-10-2024 11:13:04 PM.webm

This is me clicking randomly vs the bot. If I play seriously I can easily beat it. Allowed move for player will show light blue, player won grid will show green, bot won tile will show red, and tied will show yellow

Source code

https://github.com/seanyang0813/ultimate-tic-tac-toe-rl

How to try it

Pull the repo above Follow the readme requirements txt with

pip install -r requirements.txt

Running the game UI

python game_display.py

For bench mark run

python benchmark.py

For training

python tianshou_train.py

vjeux commented 8 months ago

This is so cool!

From everything I've read about reinforcement learning is that it's super important to figure out the right reward mechanism as incremental steps instead of just at the very end. Maybe having some reward for winning a sub-board would make it better.

vjeux commented 8 months ago

Congratz, you won $100! Let me know how you want to receive the money by reaching out to me on twitter (@vjeux) or email: vjeuxx@gmail.com

Algorithm-Arena / weekly-challenge-8-ultimate-tic-tac-toe

Submission - Ultimate Tic Tac Toe RL #3

Description

Video

Source code

How to try it