Zeta36 / chess-alpha-zero

Chess reinforcement learning by AlphaGo Zero methods.
MIT License
2.14k stars 481 forks source link

Update on work in progress #72

Open brianprichardson opened 6 years ago

brianprichardson commented 6 years ago

I am currently working on larger NNs (256x20). Still using supervised PGN file input. Have disabled "testeval".

Trying 32 policy and value head filters per Leela, and here: https://medium.com/oracledevs/lessons-from-alpha-zero-part-6-hyperparameter-tuning-b1cfcbe4ca9a Last "best NN" had 128x10 with 8 policy and 4 value head filters.

Also trying using learning rate finder https://github.com/surmenok/keras_lr_finder LR = 0.015 looks good.

Also trying 1cycle LR https://medium.com/@nachiket.tanksale/finding-good-learning-rate-and-the-one-cycle-policy-7159fe1db5d6

Will also try 2 epochs per batch.

Things take time to run.

In the future will try skipping the first n moves of games, as I would run it with an opening book.

Likewise would like to try with tablebases.

brianprichardson commented 6 years ago

32 policy and value head filters looks fine. LR finder and 1cycle also looking good.

Now trying 2 epochs per batch. Takes about a week to tune and test for improvement. I use cutechess-cli to test as "eval" command tends to result in repeated games.

brianprichardson commented 6 years ago

Larger 20x256 nets take considerably longer, so still crunching on training. Also, the cutechess run to measure any improvement takes much longer to play the games.

brianprichardson commented 6 years ago

Wrong, wrong, wrong.

After the better part of 3 weeks training plus a week of testing, the new 256x20 net it is clearly at least 100 Elo worse than the prior best net (at fixed playouts, nevermind equal time). Moreover, I'm not even sure which net is the prior best (256x7, or 128x10, ???). Maybe the 32 head and policy filters are not a good thing. Maybe testeval should be left on. Still think I like LR finder and 1 cycle. Policy and value weights--who knows. Time to stop "ready, fire, aim".

So, taking a deep breath and a big step back to start again. Instead of trying to go back to the point of the current best net (which I'd have to find first and could probably never duplicate), I'm considering starting small and simple with learning only 3 piece endgames, then 4, 5, 6. These nets could be small and fast, and hopefully won't waste a whole month of time. Thanks to @dkappe for the idea in Leela chess Discord.

This is what can happen when watching the Lc0 project.