Benchmark:
./lc0 benchmark -b eigen --threads=$num_threads -w b30e742bcfd905815e0e7dbd4e1bafb41ade748f85d006b8e28758f1a3107ae3 --num-positions=34--movetime=-1 --nodes=26000
Problems:
Spin wastes CPU cycles, especially when other function/application also need CPU.
Even no other function/application asks CPU, high contention spin downgrades the performance
example:
When num_threads=128, pendingsearchers spin waiting takes ~18% cpu cycles.
When num_threads=160, this spin takes ~88% cpu cycles, and score goes down to ~50% of num_threads=128
Solution in this Change:
Exponential back-off spin with sleep to reduce contention on pendingsearchers, and release CPU in when it is not available for a long time.
By this change:
CPU utilization can be reduced as num_threads from 64 to 160.
Score generally keeps flat before 128, while, improves from 128 to 160.
This change is to improve performance in high core count scenario, and reduce CPU utilization.
Environment: Hardware: Intel Icelake 2 sockets/80 physical cores/160 logical cores platform
OS: CentOS Stream9 system
Benchmark: ./lc0 benchmark -b eigen --threads=$num_threads -w b30e742bcfd905815e0e7dbd4e1bafb41ade748f85d006b8e28758f1a3107ae3 --num-positions=34--movetime=-1 --nodes=26000
Problems:
Solution in this Change: Exponential back-off spin with sleep to reduce contention on pendingsearchers, and release CPU in when it is not available for a long time. By this change:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
opt/base -1 | 8 | 16 | 32 | 64 | 128 | 160 -- | -- | -- | -- | -- | -- | -- CPU utilization(count) change | 0% | 0% | -1% | -3% | -13% | -36% Performance improvement | 0% | 1% | -1% | 1% | 0% | 67%