LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.46k stars 535 forks source link

Training using not many computer resources #1192

Closed dangi12012 closed 4 years ago

dangi12012 commented 4 years ago

BUG REPORT

Running client.exe uses 15% of my Cpu und 8% of my gpu time. I dont think this is wanted and it seems that the software could spawn more instances of itself for training or use c++ futures internally. (more workers for training task)

Having more instances per client.exe would also overlap memory bound, cpu bound and gpu bound threads inside a single process and increase performance.

1 Instance: 450 games/day 2 Instance: 900 games/day

What is confusing is that neither CPU nor GPU are pegged at 100% which would imply that the bottleneck is intentional or unoptimized code.

I would like to have a client.exe that can use all cpu cores/ all gpus at once. Like gpu mining software.

Steps to Reproduce

  1. Run client.exe
  2. Observe Resource Monitor and CUDA Monitor
  3. Observe 6% gpu time and 15% cpu time

Lc0 version v0.24.1+git.4b8acff built Mar 15 2020

Lc0 parameters client.exe

Hardware

SOLVE Have a database that polls client hardware and downloads appropriate network for gpu. (12x 2080ti is not the same as a laptop 760ti) There are not that many gpu ids out there and it would be easy for the network backend to have more contribution to lz0.

mooskagh commented 4 years ago

Windows 10 GPU utilization does not measure compute utilization. To measure compute utilization, you need to look at the Compute_0 (or CUDA) queue utilization by selecting it via the little down arrows on the GPU screen in the Performance tab.

image

If you are still sure that GPU is underutilized, could you please join #help channel our Discord chat at http://lc0.org/chat? That way it would be faster to debug the problem.

Client should easily load GPU fully, you are the first to report this problem.

Naphthalin commented 4 years ago

Not an issue, "solution" was posted, author never responded. Can probably be closed.