Debneil / lczero-colab-files

Stuff
0 stars 0 forks source link

Some observations #1

Open brianprichardson opened 4 years ago

brianprichardson commented 4 years ago

This will be most helpful to people looking to train nets. Thank you for taking the time to write it up.

Some feedback:

The CCRL Standard Dataset (CSD) network is 10b and in your yaml 20b. The 20b will take longer to train and to run the match and is not a fair comparison.

I suggest using a yaml with the learning rate and number of steps, batch size, etc as close as possible to the CSD one. In your yaml, the total steps of 100K means the last LR starting at 130K is not run?

Overall suggestion is to try to replicate the CSD results first to make sure your own methodology is correct. The match results should be very close to 50/50. Then, try training another net with different input.

Very important to point out changing the value_loss_weight to 1.0 as is the case in your yaml. I think the old CSD yaml has not been updated. People should not use the old yaml unchanged.

Also, the CSD training data has both PGN and chunk files so no need to preprocess. And chunk files from the training tool with regular PGN input will only have policy info for the one move actually made. It turns out that this reduces the net strength by about 150 Elo. For working with the CSD as a baseline this is fine, but something to keep in mind. Dkappe has shared some input files created from PGNs that were augmented with quick Stockfish searches to add additional move policy values (BadGyal data).

The CSD ran for 200K steps I think. This can take quite a long time, as you point out.

For test matches, again it is helpful to verify the methodology first. From time to time I do a "sanity check" by running two identical copies of the same engine/net against itself to make sure the results are almost exactly 50/50. I have found that restart=on seems to be very important, but have to looked at it in a while.

If the nets are the same size (10b v 10b), then a fixed nodes per move test (which your example uses) is fine and very fast (can use concurrency of 2 or 3 depending on h/w). For nets with different sizes, time per move matches are more appropriate, although the number of moves can also be done in the ratio of speed difference, but I prefer actual times per move.

Informal tests I ran put the CSD net at about 2,900 Elo. I was testing v Crafty (see CCRL ratings for Crafty on 1-4 CPUs). Performance will vary of course depending on GPU.

Again, this is a long overdue guide and you made a great start. I hope the guide can be improved even more and eventually include training with a local GPU.

Finally, probably should mention that this is all Supervised Learning (SL) from regular PGN games that have already been played. The main Leela chess project uses Reinforcement Learning (RL) where the input games are from Leela nets playing each other (which incidentally includes all of the move policy info). Many Leela nets are trained with a smaller number of input games (tens of thousands instead of millions) and the nets gradually improve from random play. Hundreds of incrementally better nets are created with the same hyper-parameters and this is called a Run. Currently there are three separate Runs being done, and there are many older Runs.

Generating the self-play games takes a lot of GPU power, so RL is not practical for most individual efforts. Of course, the already created Leela games can be used with SL to train new nets. Some even start with SL and then "finish off" with some RL. This might be getting too far off-topic, but some context would be helpful, I think.

Debneil commented 4 years ago

Thank you for your well thought-out and constructive criticism, and insights into matters I had little to no knowledge about. Truth is, I literally started out with all of this a couple of days back. Had to look through numerous sources, and ask for help countless times to even get it to work. None of this attempts to excuse the glaring errors and holes in my methodology, as you have pointed out; that's just the reason they are there.

I shall do my best to address every single issue you have pointed out over the coming weeks, and keep modifying the guide accordingly. I may require a fair bit of guidance along the way. Would it be much of a bother to you, if I ping you up to understand / clarify some things from time to time?

In any case, I'm keeping this issue open as long as I'm yet to sort out all the fine points.