Benchmark:
./lc0 benchmark -b eigen --threads=$num_threads -w b30e742bcfd905815e0e7dbd4e1bafb41ade748f85d006b8e28758f1a3107ae3 --num-positions=1
Problems:
With latest lc0 code, it's observed sometimes the benchmark hit bad case in the moment CPU resource is constraint, and it's due to proactively wake up TaskWorkers.
When CPU resource is constraint, proactively wake up could make SearchWorker off CPU and TaskWorkers on CPU, unfortunately there is nothing in pickingtasks, so that TaskWorkers will be busy waiting but actually starving. Even if they yield, TaskWorkers are probably picked up by OS scheduler than SearchWorker since they slept for a long time.
Here, to simulate the CPU constraint moment, we use taskset to specify core ids that the count is equal to num_threads, e.g.
taskset -c 0-7,80-87 -b eigen --threads=16 -w b30e742bcfd905815e0e7dbd4e1bafb41ade748f85d006b8e28758f1a3107ae3 --num-positions=1
The result is in Table 1, the bad case performance drop is 60% and 78% compare with normal case.
Table 1: Performance in CPU constraint moment
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
This pr suggests to skip the proactively wake up TaskWorkers since it causes performance drop in CPU constraint moment.
Environment:
Hardware: Intel Icelake 2 sockets/80 physical cores/160 logical cores platform
OS: CentOS 8 Stream with kernel version 6.2
Benchmark: ./lc0 benchmark -b eigen --threads=$num_threads -w b30e742bcfd905815e0e7dbd4e1bafb41ade748f85d006b8e28758f1a3107ae3 --num-positions=1
Problems:
With latest lc0 code, it's observed sometimes the benchmark hit bad case in the moment CPU resource is constraint, and it's due to proactively wake up TaskWorkers.
When CPU resource is constraint, proactively wake up could make SearchWorker off CPU and TaskWorkers on CPU, unfortunately there is nothing in pickingtasks, so that TaskWorkers will be busy waiting but actually starving. Even if they yield, TaskWorkers are probably picked up by OS scheduler than SearchWorker since they slept for a long time.
Here, to simulate the CPU constraint moment, we use taskset to specify core ids that the count is equal to num_threads, e.g. taskset -c 0-7,80-87 -b eigen --threads=16 -w b30e742bcfd905815e0e7dbd4e1bafb41ade748f85d006b8e28758f1a3107ae3 --num-positions=1
The result is in Table 1, the bad case performance drop is 60% and 78% compare with normal case.
Table 1: Performance in CPU constraint moment <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">