Theoretically Optimal Play

Casper-Guo commented 1 month ago

I am making an effort to use policy iteration for achieving optimal play under the Finkel ruleset. I have ran the search algorithm and there is ~137M non-end board states. With some computing power it should be feasible to run many iterations of the algorithm and achieve convergence to the optimal strategy.

I am currently bottlenecked by the RAM required to produce the full game graph necessary for running the algorithm but I expect to have access to some high-performance computing capacity in the near future. You can find my plans in planning.md in my repo.

I am wondering if anyone else have attempted a similar approach and if there is any pitfall I should watch out for. I suppose you can say this is just expectimax on steroids given the unlimited backup depth. Any critique is appreciated!

Sothatsit commented 1 month ago

Hi @Casper-Guo, I think we actually have done something similar to what you suggest! We used value iteration to solve the game for the Finkel, Blitz, Masters and Aseb rulesets.

I have been working on a blog post that details how this works, and another member of our Discord server, Jeroen, is planning to write a paper on it!

It's not very well documented at the moment, but the Lut class in RoyalUr-Java can read a table of all game states and associated chances of light winning (https://github.com/RoyalUr/RoyalUr-Java/blob/main/src/main/java/net/royalur/lut/Lut.java). That can be used to play optimally, as you can just pick the move that leads to the state with the highest chance of winning. We have also uploaded the model files for each of these on HuggingFace: https://huggingface.co/sothatsit/RoyalUrModels/tree/main. Their implementation for solving the game using Julia is also open-source: https://github.com/JeroenOlieslagers/game_of_ur

I'd say the main pitfalls in writing an implementation for solving the game is managing the win percentages. In RoyalUr-Java, we just calculate all win percentages relative to the light player (i.e., the chance that light wins). This makes it easier, as swapping the chance back and forth between light/dark can be an easy place for subtle bugs to creep in. This is pretty important for the greedy selection.

Good luck with your implementation! If you're interested, we have discussed a lot of things related to this on our Discord server: https://discord.gg/FbCp76esgb. It would be great to say hi if you're interested!

Casper-Guo commented 1 month ago

Hi, thanks for the pointer! I will direct some more detailed questions to the dedicated repo. I have taken a look and the code organization and some decisions we made are eerily similar haha.

In RoyalUr-Java, we just calculate all win percentages relative to the light player (i.e., the chance that light wins). This makes it easier, as swapping the chance back and forth between light/dark can be an easy place for subtle bugs to creep in.

I am taking a similar approach. The game is seen purely from the white perspective and so are the win rates. It is trivially easy to use this model to play as black anyways.

Can you please resend the discord invite? It doesn't seem to be working

Sothatsit commented 1 month ago

Can you please resend the discord invite? It doesn't seem to be working

Here's a new invite link: https://discord.gg/wYcR9MrQs2

What happened with the old one? If it's broken I should probably update the link in a few places! Thanks :D

Casper-Guo commented 1 month ago

This new one doesn't work either so it's probably something on my end

RoyalUr / RoyalUr-Analysis

Theoretically Optimal Play #5