Move history (8 half moves) input request

Zeta36 / chess-alpha-zero

Chess reinforcement learning by AlphaGo Zero methods.

MIT License

2.13k stars 481 forks source link

Move history (8 half moves) input request #67

Closed brianprichardson closed 4 years ago

brianprichardson commented 6 years ago

@Akababa and @benediamond and @anyothers with comments or suggestions: I noticed some older efforts in your forks with the 8 (half) move history AZ input idea. After considerable, but far from exhaustive, testing with only the current board as input, I am starting to think that some move history is critical. LCZero uses it with a far smaller (64x6) NN than I have been training (256x7), and my net still drops material carelessly. This is after several hundred thousand supervised games.

Anyway, I may be missing other larger picture issues (random self-play works better than supervised?), but I would like try supervised training with some move history, perhaps starting with 4 half moves.

Accordingly, could you please point me to your "best" versions. One was with 110 and another with 105 (96+5), IIRC. I plan to just graft the input back on to the relatively stable Zeta master that I have been fiddling with to use pgn input.

Thanks, Brian

Akababa commented 6 years ago

I don't see any reason for move history to help with tactics; since there isn't really any causation there I'd expect the weights in the history planes to tend to 0 asymptotically. However I guess it could produce a better result for a small amount of total training. Also of note might be the Giraffe chess project which I think got to ~2400 level without history heuristics.

That being said, I think I unfortunately deleted my branches with the history planes :( You could probably get to where I was relatively quickly though as I was just using my laptop. Good luck!

benediamond commented 6 years ago

Hey @brianprichardson, as @Akababa mentioned---though it's been a while, I remember adding move history primarily in an effort to avoid draws by repetition. After all, 8 is precisely the number of halfmoves of history one must retain in order to access the current position together with two 4-halfmove repetition cycles. (draws by repetition can happen over longer periods, but this is rare.) Absent these, how could the computer learn to avoid draws?

In any case, I only kept one active branch, master, and if I recall the code for feature generation can be found here and in surrounding methods:

https://github.com/benediamond/chess-alpha-zero/blob/a25fd924267aaad65d408b70bd7d1e64d493645a/src/chess_zero/env/chess_env.py#L167-L178

Let me know if this helps!

brianprichardson commented 6 years ago

Thanks to @Akababa and @benediamond for your responses. I am still working on things, but I am learning as I go, so it is taking some time. I need to get to where I understand things well enough to know when a given NN is about as good as it can get (with supervised play) which would be a relatively stable baseline, and then try to add history input to see if that helps. Along the way, I keep finding good old GIGO (garbage in garbage out) mistakes. I did modify things to only input positions with mate in one (half) move and it learned to 99%+ accuracy (after just a few thousand samples and 20 epochs in a couple of minutes), so it does indeed learn. However, it is a rather large step from learning just one or two things to learning the entire game. Thanks again for sharing your work.

PS Well aware of Giraffe. Perhaps you might find the Texel / Giraffe experiments interesting. Search on Talkchess. Giraffe search with Texel eval, and Texel with Giraffe eval, adjusting for time.

brianprichardson commented 4 years ago

Just to close off this issue...I have done extensive testing of move history, 3 reps, and rule 50 input being supplied to the nets, and at least for Leela Chess, history and 3 reps do not help, and rule 50 seems to cost about 18 Elo. This might be because the nets are being trained on positions, whereas the position and the moves leading to it are handled by the net plus the search code.