dje-dev / Ceres

Ceres - an MCTS chess engine for research and recreation
GNU General Public License v3.0
153 stars 23 forks source link

Unable to find an entry point named 'Alloc' in DLL 'LC0'. #10

Closed cyrenaique closed 3 years ago

cyrenaique commented 3 years ago

Hello, when running command like go nodes 10 I got this error and obviously crashed. I compiled LC0 from src from https://github.com/dje-dev/lc0. windows 10, VS2019, cuda10.2 . Any ideas where I made a mistake? Thanks Arnaud

dje-dev commented 3 years ago

I assume this procedure was followed? https://github.com/dje-dev/Ceres/blob/main/BuildDLL.md Key steps:

  1. Replace one c++ file in the LC0 distribution with the version in Ceres source code. Sorry, there is a typo in the instructions, it is network_cudnn.cc and not network_cuda.cc, taken from \src\Ceres.Chess\NNEvaluators\LC0DLL\network_cudnn.cc
  2. Change the c++ project file to output a DLL instead of EXE Then build and copy the LC0.DLL to the appropriate directory (if not already there).
cyrenaique commented 3 years ago

Thanks a lot for the answer, I actually did that and still have the same issue. I will double check. Thanks again.

eahova commented 3 years ago

@cyrenaique - I was having the same issues. I did this to get around the error:

1) Downloaded cudnn and changed my build paths to be cudnn for cudnn_path (previously I had it as the same as CUDA but then it didn't even use the network_cudnn.cc file in the build process...I could delete that file and it didn't care) set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2 set CUDNN_PATH=C:\Users\e\Downloads\cudnn-11.1-windows-x64-v8.0.5.39\cuda

2) Commented out the "std::unique_ptr MakeCudaNetworkAuto" method declaration in the network_cuda.cc file. I just slapped a / / around it and also took out the 3 REGISTER_NETWORKS at the end..not sure that was needed . It was giving me a build error "Class already defined in src_neural_cuda_network_cudnn.cc.obj"

It seems to build for me now and now doesn't crash when i try "go nodes 10" like the setupexample....

Perhaps someone else has a better solution, or maybe there are some changes needed to the network_cuda.cc file as well that weren't included.

Good luck

gsobala commented 3 years ago

You don't need to build the cuda backend at all, only the cudnn backend. Either edit meson_options.txt and change plain_cuda to false or add SET PLAIN_CUDA=false at the top of build.cmd. That gets rid of the clashes between the two files in src/neural/cuda.

Secondly the current dll build instructions are an unholy mess, partly because nVidia currently have cudnn files for cuda 11.1 but not for 11.2. Its perfectly possible to build lc0.dll just against 11.1 but it will only run if the user has cuda 11.1 libs installed or available runtimes.

eahova commented 3 years ago

Thanks for the info. I reverted my changes to network_cuda.cc to try @gsobala suggestions.

  1. add SET PLAIN_CUDA=false at the top of build.cmd --> this didn't seem to work...still got the clash on build
  2. tried to use CUDA\v11.1 instead of CUDA\v11.2 --> for me didn't detect cudnn-11.1 in that v11.1 CUDNN_PATH directory so I assume it would go back to having the missing entry point named 'Alloc' errors
  3. keep CUDNN_PATH pointed to cudnn-11.1 directory, edit meson_options.txt to change plain_cuda to false --> this worked, no clash, built fine

I think it makes sense to clarify the build instructions with these tips:

  1. Edit meson_options.txt to disable plain_cuda (if you get build clashes)
  2. Verify you see that "Library cudnn found: YES" when you run build.cmd (otherwise it isn't going to build the new files)
dje-dev commented 3 years ago

Thank you all for the clarifications. To be frank I don't understand these build issues very well. I will add the suggested tips.

The LC0.DLL code is not particularly clean or elegant (but does work flawlessly if you can get it to build). I have not invested a lot of time here because I'm hoping the LC0 developers will create an API to make the backends reusable by any engine, not just Ceres. They are a gem (very well tuned and run on many different types of hardware). If that happens, they will do so with a better design and better C++ project build skills than I have.

AlexanderSWilliams commented 3 years ago

I followed eahova's steps and am receiving the same error. The only obvious difference is that I installed CUDA 11.2 because I do not have an older version of the installer. Library cudnn is found and I edited the meson_options.txt file. Is there anything else I should try?

eahova commented 3 years ago

I followed eahova's steps and am receiving the same error. The only obvious difference is that I installed CUDA 11.2 because I do not have an older version of the installer. Library cudnn is found and I edited the meson_options.txt file. Is there anything else I should try?

Are you getting the build error (code clash) or the runtime error (unable to find "alloc")?

AlexanderSWilliams commented 3 years ago

Unable to find "alloc".

On Tue, Jan 5, 2021 at 8:06 PM eahova notifications@github.com wrote:

I followed eahova's steps and am receiving the same error. The only obvious difference is that I installed CUDA 11.2 because I do not have an older version of the installer. Library cudnn is found and I edited the meson_options.txt file. Is there anything else I should try?

Are you getting the build error (code clash) or the runtime error (unable to find "alloc")?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dje-dev/Ceres/issues/10#issuecomment-755028454, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFBSTU5AR5XM22UFX3NZIDSYPARFANCNFSM4VQI7K5A .

eahova commented 3 years ago

You want to make sure it is compiling using the new network_cudnn.cc file. One thing to try is to make certain it is actually using the file in the build.

Go rename that file so it no longer exists as "network_cudnn.cc" and then rebuild the solution....if it still builds fine even without that file then you know what the problem is....

You might also want to verify that the cudnn directory is in your PATH so Visual Studios can find it....

AlexanderSWilliams commented 3 years ago

Does anyone know the commit hash of the lc0 project where network_cudnn.cc was forked from? The diffs of the most current versions are rather substantial.

AlexanderSWilliams commented 3 years ago

I got past that error with "network_cudnn.cc". However, I'm opening a new issue for a new problem I'm running into.