Improve value focus by having it done before general downsampling.

hans-ekbrand commented 3 years ago

SKIP=32 in train.py defines how many sequental positions to discard to avoid sampling over-correlated positions since they typically are from sequential positions in a game. (technically, SKIP is used to create a probability of inclusion, so there is no hard limit against sequential positions, they are just not likely to appear).

After discarding 31 out of 32 positions, value_focus discards further positions, especially positions where orig_q is very similar to eval after search (best_q). This patch does two things:

It reverses the order of these to downsampling mechanisms, so that value_focus sampling gets to see all positions (instead of only one position in 32).
It adjusts the downsampling implied by SKIP=32, so that the target sampling rate is 1 over SKIP when the downsampling performed by value_focus is accounted for.

Preliminary testing the idea rather than the implementation by hardcoding SKIP=1 and setting value_focus_min to 1/32 suggests this is an improvement.

Tilps commented 3 years ago

Whether we down sample by a factor of 32 before or after shouldn't matter - we do multiple passes of the data and get multiple chances to apply value filtering to the results. The purpose of the 1/32 sampling is to have better shuffling of the data. However the current code implementation doesn't seem to be sample down by a factor of 32 after - its some curious if statement attempting to get the selection ratio to 1/32 in total. That may reduce the shuffling quality compared to before - although probably not meaningfully. (Almost certainly not meaningfully compared to pre value focus) If anything the only upside I would expect from this PR is maybe higher training speed when data pipeline is the bottleneck.

hans-ekbrand commented 3 years ago

Whether we down sample by a factor of 32 before or after shouldn't matter - we do multiple passes of the data and get multiple chances to apply value filtering to the results.

Point taken, I didn't know we did multiple passes of the data. Does each pass start from the same position?

Still, if the number of passes is (well) below 32, some positions will never be shown to value focus, and if that is the case this PR will have its intended effect.

hans-ekbrand commented 3 years ago

I have tested the code and it works. As tilps pointed out, one effect of this PR is that it speeds up the sampling of positions. While that is in itself a good thing, I wouldn't have bothered going furter with this PR if that was the only effect. In my testing I discovered a bug in the current code that made an assertion in shufflebuffer.py, in insert_or_replace() fail (see https://discord.com/channels/425419482568196106/425419999096733706/866464023212720149 for an elaborated description of that problem).

The PR turned out to be required for low values of value_focus_min. Without the PR, train.py bails out due to the failed assertion.

I tested the PR only with SL, and there I got good results with value_focus_min: 0.1 trained for 7.200 steps on a set of 875912 chunks of T74 training data. In RL, I suspect higer values are better, but I don't know how high one can go and still have an effect. Since T74 is about to be retired I suggest we finish it off with a value_focus experiment.

hans-ekbrand commented 3 years ago

Closing this since it does not affect the distribution of q.diffs in the sampled records. The bug revealed here is still a problem, but it is most likely only hidden by this PR.

LeelaChessZero / lczero-training

Improve value focus by having it done before general downsampling. #160