LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.46k stars 535 forks source link

error CUDA error: unknown error (c:\projects\lc0\src\neural\cuda\network_cudnn.cc:203) on go nodes 100 #911

Closed ghost closed 4 years ago

ghost commented 5 years ago

When I try to launch the program and send the command go nodes 100 it hangs with this error:

error CUDA error: unknown error (c:\projects\lc0\src\neural\cuda\network_cudnn.cc:203)

I have an Nvidia GeForce 780 Ti OC. Updated to the newest drivers, installed the CUDA toolkit, nothing to do... Untitled

The program ran smoothly (CUDA flavour) until a couple of weeks ago...

borg323 commented 5 years ago

Can you check that the drivers you have are the latest from nvidia and not from windows update?

ghost commented 5 years ago

Absolutely, I install only original NVIDIA drivers from the NVIDIA website, now through GeForce Experience.

ghost commented 5 years ago

Is it possible that nobod have a clue?? I'm getting this also with Lc0 v.0.22. I have erased any NVidia drivers, rebooted the system and reinstalled them, nothing to do... My graphic card is an NVidia GeForce 780 TI OC

ghost commented 5 years ago

old?? it's the latest lc0 release!! and I can't switch to Windows 10. I want to point that Lc0 CUDA ran smoothly until a crash when I tried to stream the game with OBS, this software used the graphic card encoding.

ghost commented 5 years ago

What is the actual newest Lc0 then?

borg323 commented 5 years ago

The latest version is v0.22.0: https://github.com/LeelaChessZero/lc0/releases/tag/v0.22.0

ghost commented 5 years ago

That is just the one I was writing about.

ghost commented 5 years ago

Log file:

============= Log started. ============= 0811 07:35:00.840475 4492 c:\projects\lc0\src\main.cc:37] Lc0 started. 0811 07:35:00.840544 4492 c:\projects\lc0\src\main.cc:38] 0811 07:35:00.840741 4492 c:\projects\lc0\src\main.cc:39] | | | 0811 07:35:00.840937 4492 c:\projects\lc0\src\main.cc:40] | | |_| v0.22.0 built Aug 5 2019 0811 07:35:00.844105 4492 c:\projects\lc0\src\utils\commandline.cc:45] Command line: lc0.exe --logfile=log.txt 0811 07:35:04.902445 4492 c:\projects\lc0\src\chess\uciloop.cc:131] >> go nodes 100 0811 07:35:04.903053 4492 c:\projects\lc0\src\neural\loader.cc:206] Found pb network file: ./1a167a875c3d9e242f663f30ba877b5b046dcb0c193b79fd43ddacbf8b5b17ed.gz 0811 07:35:05.532228 4492 c:\projects\lc0\src\neural\factory.cc:84] Creating backend [cudnn]... 0811 07:35:05.703535 4492 c:\projects\lc0\src\utils\exception.h:39] Exception: CUDA error: unknown error (c:\projects\lc0\src\neural\cuda\network_cudnn.cc:203) 0811 07:35:05.726708 4492 c:\projects\lc0\src\chess\uciloop.cc:218] << error CUDA error: unknown error (c:\projects\lc0\src\neural\cuda\network_cudnn.cc:203)

mooskagh commented 5 years ago

Usually the reason for that is mismatch between CUDA .dlls and CUDA drivers. (e.g. 10.0.xxx vs 10.1.xxx). Updating NVidia drivers and rebooting should help.

ghost commented 5 years ago

as I've said so many times, I have updated the drivers any single time. Yet

c:\lc0>lc0.exe | | | | | |_| v0.22.0 built Aug 5 2019 go nodes 100 Found pb network file: ./1.gz Creating backend [cudnn]... error CUDA error: unknown error (c:\projects\lc0\src\neural\cuda\network_cudnn.cc:203)

borg323 commented 5 years ago

@massimilianogoi can you try with https://ci.appveyor.com/api/buildjobs/cbesqskj5kgvte3p/artifacts/lc0-windows-gpu-nvidia-cuda.zip, it is a build with cuda 9.2 dlls.

ghost commented 5 years ago

@borg323 always the same error:

c:\lc0>lc0.exe | | | | | |_| v0.23.0-dev+git.6d7c1e3 built Sep 3 2019 go nodes 100 Found pb network file: ./b2ec465d0fb5b5eb39d2e1e3f74041a5d2fc92d413b71aa7ea0b6fb082ccba9c.gz Creating backend [cudnn]... error CUDA error: unknown error (..\src/neural/cuda/network_cudnn.cc:203)

borg323 commented 5 years ago

@massimilianogoi this one has improved error reporting: https://ci.appveyor.com/api/buildjobs/npaav8gj954w7f8t/artifacts/build%2Flc0.exe. It is only the exe, you will need the dlls from the release zip (not the cuda 9.2 ones from the previous test).

ghost commented 5 years ago

@borg323

   _

| | | | | || v0.23.0-dev+git.8210e2c built Sep 9 2019 go nodes 100 Found pb network file: ./6e404a13dab65d9b06822575e9b0a96c2984ba207b31e3fbe5e26c3 163474499 Creating backend [cudnn]... CUDA Runtime version: 566418.33.6 WARNING: CUDA Runtime version mismatch, was compiled with version 10.0.0 Cudnn version: 7.4.2 Latest version of CUDA supported by the driver: 10.1.0

Just registered to the NVidia developers program... to see they only have built Cudnn for a thousands Linux and only Windows 7 and Windows 10... -_-

Apparently the problem is that I have Windows 8.1 then...

It's a pity, since Lc0 worked since some months ago... Anyway I would set as official the one with the improved error reporting.

borg323 commented 5 years ago

The cuda dlls included with lc0 are the windows 10 ones. Maybe a system update changed something that affects compatibility recently. Assuming the cuda you installed was the windows 8.1 version, can you replace cudart.dll and cublas.dll in the lc0 directory with the ones from the cuda installation and try again?

borg323 commented 5 years ago

@massimilianogoi can you try https://ci.appveyor.com/api/buildjobs/cue9tlx2qpxqcmoh/artifacts/build%2Flc0.exe? It has some additional diagnostics to see why you get the strange cuda version.

ghost commented 5 years ago

C:\lc0>lc0.exe | | | | | |_| v0.23.0-dev+git.e0f705f built Sep 17 2019 go nodes 100 Found pb network file: ./6e404a13dab65d9b06822575e9b0a96c2984ba207b31e3fbe5e26c3 163474499 Creating backend [cudnn]... CUDA Runtime version: 60470.53.6 WARNING: CUDA Runtime version mismatch, was compiled with version 10.0.0 Cudnn version: 7.4.2 Latest version of CUDA supported by the driver: 10.1.0

ghost commented 5 years ago

Trying to substitute the CUDA dll file gives me a type mismatch error.

borg323 commented 5 years ago

Can you tell me which cuda version is installed? Is is 10.1.243 for windows 8.1? I can try to make a build with the exact same version.

ghost commented 5 years ago

Thanks. I copy the full string:

NVCUDA.DLL 26.21.14.3615 NVIDIA CUDA 10.1.0 driver

Naphthalin commented 4 years ago

@massimilianogoi did you ever resolve the issue, and can you try again with current versions again please?

mooskagh commented 4 years ago

Closing for now as there's no activity on this issue, feel free to reopen if there are any updates.