Add support for min_batch to reduce non-determinism in cuda backends.

LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.

GNU General Public License v3.0

2.37k stars 523 forks source link

Add support for min_batch to reduce non-determinism in cuda backends. #1961

Closed Tilps closed 5 months ago

Tilps commented 5 months ago

This is pre-work for the policy_tournament upstreaming.

Tilps commented 5 months ago

Its been a while I tested a random old T80 net with my 4090 and I actually needed 16 to have determinism... I wonder if 4 is no longer a sensible default.

Naphthalin commented 5 months ago

While I'm not exactly sure whether reducing non-determinism justifies this, there are more reasons why adding a min_batch_size_ is a good idea:

makes things symmetric with max batch size
should reduce high pitched and unstable coil whine (coming from small batches)
very small patches might be slower due to optimizations.