QueensGambit / CrazyAra

A Deep Learning UCI-Chess Variant Engine written in C++ & Python :parrot:
https://lichess.org/@/CrazyAra
GNU General Public License v3.0
247 stars 42 forks source link

cuDNN library not fully used on Windows #80

Open QueensGambit opened 3 years ago

QueensGambit commented 3 years ago

The library cudnn_cnn_infer64_8.dll is not used on Windows, but libcudnn_cnn_infer.so.8 is used on Linux. This seems to make a visible NPS difference.

e.g. Ubuntu 18.04:

GPU: RTX 2070 OC

isready
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-1.onnx
info string deserialize engine: model/chess/model-bsize1-fp16-0.trt
info string inputDims: (1, 39, 8, 8)
info string valueOutputDims: (1, 1)
info string policyOutputDims: (1, 4864)
info string No auxiliary outputs detected.
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx
info string deserialize engine: model/chess/model-bsize16-fp16-0.trt
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx
info string deserialize engine: model/chess/model-bsize16-fp16-0.trt
info string inputDims: (16, 39, 8, 8)
info string valueOutputDims: (16, 1)
info string policyOutputDims: (16, 4864)
info string No auxiliary outputs detected.
readyok
go infinite
info string create new tree
info string run mcts search
info depth 17 seldepth 28 multipv 1 score cp 47 nodes 18522 nps 18485 tbhits 0 time 1002 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2
info depth 19 seldepth 31 multipv 1 score cp 47 nodes 38347 nps 19154 tbhits 0 time 2002 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1
info depth 19 seldepth 37 multipv 1 score cp 47 nodes 57007 nps 18990 tbhits 0 time 3002 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1

GPU-Utility: 91%

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   48C    P2   152W / 215W |    677MiB /  7982MiB |     91%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:0B:00.0  On |                  N/A |
|  0%   54C    P2    56W / 250W |    441MiB / 11177MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

e.g. Windows 10:

GPU: RTX 2070 OC

isready
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-1.onnx
info string deserialize engine: model/chess/model-bsize1-fp16-0.trt
info string inputDims: (1, 39, 8, 8)
info string valueOutputDims: (1, 1)
info string policyOutputDims: (1, 4864)
info string No auxiliary outputs detected.
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx
info string deserialize engine: model/chess/model-bsize16-fp16-0.trt
info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx
info string deserialize engine: model/chess/model-bsize16-fp16-0.trt
info string inputDims: (16, 39, 8, 8)
info string valueOutputDims: (16, 1)
info string policyOutputDims: (16, 4864)
info string No auxiliary outputs detected.
readyok
go infinite
info string create new tree
info string run mcts search
info depth 17 seldepth 28 multipv 1 score cp 47 nodes 16500 nps 16369 tbhits 0 time 1008 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2
info depth 19 seldepth 31 multipv 1 score cp 47 nodes 33367 nps 16584 tbhits 0 time 2012 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1
info depth 19 seldepth 33 multipv 1 score cp 47 nodes 50400 nps 16617 tbhits 0 time 3033 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1

GPU-Utility: 85%

C:\Windows\System32\DriverStore\FileRepository\nv_dispui.inf_amd64_c1f8f32cc9af9677>nvidia-smi
Fri Apr  9 17:37:37 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.33       Driver Version: 461.33       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 207... WDDM  | 00000000:01:00.0 Off |                  N/A |
| 29%   59C    P2   141W / 215W |    845MiB /  8192MiB |     85%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108... WDDM  | 00000000:0B:00.0  On |                  N/A |
|  0%   38C    P8    17W / 250W |    692MiB / 11264MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
QueensGambit commented 3 years ago

I was able to link the binary to cudnn_cnn_infer64_8.dll but this didn't seem to help unfortunately.

Also adding certain optimization options such as /O2 (Maximize Speed), /GL (Whole program optimization), /LTCG (Link-time code generation) didn't result in a NPS improvement.