Mcts -1 value on defeat

danzel / PatchworkSim

Simulation of the board game Patchwork for AI exploration

2 stars 0 forks source link

Open danzel opened 6 years ago

danzel commented 6 years ago

Atm we do 0 value on defeat. I think maybe we are meant to be doing -1. Should test with it

danzel commented 5 years ago

Changing this improves performance, below is comparison with vs alphabeta

10,000 / false vs 15 Tuning1 (without -1)

Total Time 115695 / 131531 Total Wins 41 / 59

10,000 / true vs 15 Tuning1 (with -1, exploration = sqrt2 = 1.41)

Total Time 116419 / 137432 Total Wins 52 / 48

Then I looked at changing the exploration variable (these are all with -1)

exploration 1

Total Time 114667 / 130626 Total Wins 49 / 51

exploration 1.3

Total Time 114884 / 129239 Total Wins 52 / 48

exploration 1.5

Total Time 112397 / 135142 Total Wins 55 / 45

exploration 1.6

Total Time 113258 / 131641 Total Wins 54 / 46

exploration 1.7

Total Time 115280 / 141403 Total Wins 53 / 47

So looks like a higher exporation, 1.4-1.6 somewhere is probably stronger too.

danzel commented 5 years ago

Should look at an alternative to UCT.

danzel commented 5 years ago

Trying out UCT with progressive bias, using UtilityCalculators.TuneableByBoardPositionUtilityCalculator.Tuning1 to calculate the value of a state. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182.2046&rep=rep1&type=pdf

PB costs ~15% perf. MCTS-PB vs MCTS (10k)

Weight 0.8: Total Wins 53 / 47
Weight 1.0: Total Wins 54 / 46
Weight 1.2: Total Wins 51 / 49
Weight 1.5: Total Wins 42 / 58

MCTS-PB (10k) vs AB (15)

0.8 Total Wins 54 / 46
1.0 Total Wins 49 / 51
1.2 Total Wins 49 / 51

MCTS (10k) vs AB (15)

55 / 45

So it looks like MCTS-PB is strong when vs MCTS, but weak when vs AB...

danzel commented 5 years ago

Pushed up this ^^

Next steps: Grid search of hyper parameters. UCB1-Tuned (http://mcts.ai/pubs/mcts-survey-master.pdf page 16)