Open Casper-Guo opened 1 month ago
Hi @Casper-Guo, I think we actually have done something similar to what you suggest! We used value iteration to solve the game for the Finkel, Blitz, Masters and Aseb rulesets.
I have been working on a blog post that details how this works, and another member of our Discord server, Jeroen, is planning to write a paper on it!
It's not very well documented at the moment, but the Lut
class in RoyalUr-Java can read a table of all game states and associated chances of light winning (https://github.com/RoyalUr/RoyalUr-Java/blob/main/src/main/java/net/royalur/lut/Lut.java). That can be used to play optimally, as you can just pick the move that leads to the state with the highest chance of winning. We have also uploaded the model files for each of these on HuggingFace: https://huggingface.co/sothatsit/RoyalUrModels/tree/main. Their implementation for solving the game using Julia is also open-source: https://github.com/JeroenOlieslagers/game_of_ur
I'd say the main pitfalls in writing an implementation for solving the game is managing the win percentages. In RoyalUr-Java, we just calculate all win percentages relative to the light player (i.e., the chance that light wins). This makes it easier, as swapping the chance back and forth between light/dark can be an easy place for subtle bugs to creep in. This is pretty important for the greedy selection.
Good luck with your implementation! If you're interested, we have discussed a lot of things related to this on our Discord server: https://discord.gg/FbCp76esgb. It would be great to say hi if you're interested!
Hi, thanks for the pointer! I will direct some more detailed questions to the dedicated repo. I have taken a look and the code organization and some decisions we made are eerily similar haha.
In RoyalUr-Java, we just calculate all win percentages relative to the light player (i.e., the chance that light wins). This makes it easier, as swapping the chance back and forth between light/dark can be an easy place for subtle bugs to creep in.
I am taking a similar approach. The game is seen purely from the white perspective and so are the win rates. It is trivially easy to use this model to play as black anyways.
Can you please resend the discord invite? It doesn't seem to be working
Can you please resend the discord invite? It doesn't seem to be working
Here's a new invite link: https://discord.gg/wYcR9MrQs2
What happened with the old one? If it's broken I should probably update the link in a few places! Thanks :D
This new one doesn't work either so it's probably something on my end
I am making an effort to use policy iteration for achieving optimal play under the Finkel ruleset. I have ran the search algorithm and there is ~137M non-end board states. With some computing power it should be feasible to run many iterations of the algorithm and achieve convergence to the optimal strategy.
I am currently bottlenecked by the RAM required to produce the full game graph necessary for running the algorithm but I expect to have access to some high-performance computing capacity in the near future. You can find my plans in
planning.md
in my repo.I am wondering if anyone else have attempted a similar approach and if there is any pitfall I should watch out for. I suppose you can say this is just expectimax on steroids given the unlimited backup depth. Any critique is appreciated!