Open danzel opened 6 years ago
Changing this improves performance, below is comparison with vs alphabeta
10,000 / false vs 15 Tuning1 (without -1)
Total Time 115695 / 131531 Total Wins 41 / 59
10,000 / true vs 15 Tuning1 (with -1, exploration = sqrt2 = 1.41)
Total Time 116419 / 137432 Total Wins 52 / 48
Then I looked at changing the exploration variable (these are all with -1)
exploration 1
Total Time 114667 / 130626 Total Wins 49 / 51
exploration 1.3
Total Time 114884 / 129239 Total Wins 52 / 48
exploration 1.5
Total Time 112397 / 135142 Total Wins 55 / 45
exploration 1.6
Total Time 113258 / 131641 Total Wins 54 / 46
exploration 1.7
Total Time 115280 / 141403 Total Wins 53 / 47
So looks like a higher exporation, 1.4-1.6 somewhere is probably stronger too.
Should look at an alternative to UCT.
UCB1-Tuned sounds like the other one that gets used a bit https://github.com/Yelp/MOE/blob/master/moe/bandit/ucb/ucb1_tuned.py https://webdocs.cs.ualberta.ca/~games/go/seminar/notes/2007/slides_ucb.pdf
Trying out UCT with progressive bias, using UtilityCalculators.TuneableByBoardPositionUtilityCalculator.Tuning1
to calculate the value of a state.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182.2046&rep=rep1&type=pdf
PB costs ~15% perf. MCTS-PB vs MCTS (10k)
Weight 0.8: Total Wins 53 / 47
Weight 1.0: Total Wins 54 / 46
Weight 1.2: Total Wins 51 / 49
Weight 1.5: Total Wins 42 / 58
MCTS-PB (10k) vs AB (15)
0.8 Total Wins 54 / 46
1.0 Total Wins 49 / 51
1.2 Total Wins 49 / 51
MCTS (10k) vs AB (15)
55 / 45
So it looks like MCTS-PB is strong when vs MCTS, but weak when vs AB...
Pushed up this ^^
Next steps: Grid search of hyper parameters. UCB1-Tuned (http://mcts.ai/pubs/mcts-survey-master.pdf page 16)
Atm we do 0 value on defeat. I think maybe we are meant to be doing -1. Should test with it