LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.38k stars 525 forks source link

Use back-off spin to wait for available searcher #1890

Closed pdeng6 closed 1 year ago

pdeng6 commented 1 year ago

This change is to improve performance in high core count scenario, and reduce CPU utilization.

Environment: Hardware: Intel Icelake 2 sockets/80 physical cores/160 logical cores platform

OS: CentOS Stream9 system

Benchmark: ./lc0 benchmark -b eigen --threads=$num_threads -w b30e742bcfd905815e0e7dbd4e1bafb41ade748f85d006b8e28758f1a3107ae3 --num-positions=34--movetime=-1 --nodes=26000

Problems:

  1. Spin wastes CPU cycles, especially when other function/application also need CPU.
  2. Even no other function/application asks CPU, high contention spin downgrades the performance example: When num_threads=128, pendingsearchers spin waiting takes ~18% cpu cycles. When num_threads=160, this spin takes ~88% cpu cycles, and score goes down to ~50% of num_threads=128

Solution in this Change: Exponential back-off spin with sleep to reduce contention on pendingsearchers, and release CPU in when it is not available for a long time. By this change:

  1. CPU utilization can be reduced as num_threads from 64 to 160.
  2. Score generally keeps flat before 128, while, improves from 128 to 160.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

opt/base -1 | 8 | 16 | 32 | 64 | 128 | 160 -- | -- | -- | -- | -- | -- | -- CPU utilization(count) change | 0% | 0% | -1% | -3% | -13% | -36% Performance improvement | 0% | 1% | -1% | 1% | 0% | 67%

pdeng6 commented 1 year ago

@borg323 could you please help review this PR? thanks!

borg323 commented 1 year ago

Can you make it a configuration option? Our main use case is 2 threads with a gpu backend and the spin locks were added for low latency there.

pdeng6 commented 1 year ago

Yes, I would suggest to turn on back-off since thread# >64, please take a look @borg323

Thanks Pan

pdeng6 commented 1 year ago

Thanks, A separate command line option is used, please take a look @borg323

pdeng6 commented 1 year ago

Comments addressed, please take a look @borg323, thanks a lot!

pdeng6 commented 1 year ago

Rebased, could you please take a look? @borg323

pdeng6 commented 1 year ago

Thanks a lot for help, comments addressed, please take a look @borg323

pdeng6 commented 1 year ago

@borg323 are we ready to merge? :)

pdeng6 commented 1 year ago

Thanks @borg323 to review this PR. Since I don't have w access, could you please help to merge it?

Best Regards Pan

borg323 commented 1 year ago

I'll do it on the next round of merges.

pdeng6 commented 1 year ago

I'll do it on the next round of merges.

Thanks a lot.

borg323 commented 1 year ago

Please consider joining our discord chat http://lc0.org/chat - I'm really interested in your observations with lc0 on large thread count.