Open Jay-Jay-D opened 2 years ago
The nvcc compiler finds a warning where there should be none
(it's the code line static void init() { [[maybe_unused]] static CudaInitializer instance; }
).
A quick fix for you could be to disable the option which treats warnings as errors. This could be done by removing the line add_compile_options($<$<COMPILE_LANGUAGE:CUDA>:--Werror=all-warnings>)
in CMakeList.txt.
Does it work?
But I'd like to understand the reason... Which version of the CUDA Toolkit are you using?
I tried the quick fix you suggested and the build continued up to the 98% and failed. Here is the build log.
Below is the output of nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 40% 37C P8 12W / 120W | 670MiB / 6075MiB | 14% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1774 G /usr/lib/xorg/Xorg 416MiB |
| 0 N/A N/A 2170 G /usr/bin/gnome-shell 64MiB |
| 0 N/A N/A 10508 G /usr/lib/firefox/firefox 163MiB |
| 0 N/A N/A 10837 G gnome-control-center 1MiB |
| 0 N/A N/A 14897 G ...414957901845057616,131072 19MiB |
+-----------------------------------------------------------------------------+
Thank you very much for your quick response!
I'm not an expert on such linker problems, but it seems to be the same as in https://githubhot.com/repo/owl-project/owl/issues/144 (last post). A workaround is given there. I've modified the build scripts on the test/nvlink-error
branch accordingly. Could you please try it?
Thank you very much for the help! Sadly, it didn't work, the output is the same as the one with the quick fix.
Disclaimer: this topic is WAY above my head, so, sorry if I'm playing Captain Obvious.
I starter diving into the issue from the error nvlink fatal : Could not open input file '/usr/lib/x86_64-linux-gnu/librt.a'
, the file exists but it's content is a single line.
jjd@pop-os:~$ ll /usr/lib/x86_64-linux-gnu/librt.a
-rw-r--r-- 1 root root 8 Feb 24 16:45 /usr/lib/x86_64-linux-gnu/librt.a
jjd@pop-os:~$ cat /usr/lib/x86_64-linux-gnu/librt.a
!<arch>
There was an gclib update August last year, and this answer in the Arch subredit points to that update as source of possible problems with the Nvidia linker.
Little further in the error message, it points to CMakeFiles/alien.dir/build.make:883
, the content of lines 881 to 883 is:
CMakeFiles/alien.dir/cmake_device_link.o: CMakeFiles/alien.dir/dlink.txt
@$(CMAKE_COMMAND) -E cmake_echo_color --switch=$(COLOR) --green --bold --progress-dir=/home/jjd/REPOS/alien/build/CMakeFiles --progress-num=$(CMAKE_PROGRESS_52) "Linking CUDA device code CMakeFiles/alien.dir/cmake_device_link.o"
$(CMAKE_COMMAND) -E cmake_link_script CMakeFiles/alien.dir/dlink.txt --verbose=$(VERBOSE)
Finally, in the CMakeFiles/alien.dir/dlink.txt
we find the usage of librt.a
(column 3556) followed by -ldl
, -lrt
and -lpthread
are at column 3860.
So, if those files are empty and cause error maybe the solution is somehow fix those links when generating the dlink.txt
file?
Or perhaps is related with this issue?
Hope this helps to further understand the issue.
I don't think that I can really help here :-( The linker error on your system seems to occur in general when compiling a CUDA project via a CMake script... Probably one could also try to ask here https://forums.developer.nvidia.com/c/accelerated-computing/cuda/cuda-nvcc-compiler/454 ? I would be interested in the answer. However, I cannot reproduce the problem on my system.
I did run into the same build error Debian testing/sid while compiling v3.2.1 (the "unused" warning).
After I deleted the unused variable in CudaSimulationFacade.cu
everything compiled and seems to run just fine. (Btw. really cool project, thanks a lot.)
System information, in case it helps someone:
$ dpkg -l|grep -i nvidia-cuda
ii nvidia-cuda-dev:amd64 11.4.3-4 amd64 NVIDIA CUDA development files
ii nvidia-cuda-gdb 11.4.120~11.4.3-4 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 11.4.3-4 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.4.3-4 all NVIDIA CUDA and OpenCL documentation
$ nvidia-smi
Mon Jul 25 11:57:37 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:09:00.0 On | N/A |
| 0% 46C P8 11W / 120W | 389MiB / 5941MiB | 10% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Yes it seems to be the cuda version.
I have upgraded from 11.4.3 to 11.5.2 (apt install -t experimental nvidia-cuda-dev nvidia-cuda-toolkit
), reverted my change, created a new build folder, problem disappeared.
System info after update:
ii nvidia-cuda-dev:amd64 11.5.2-1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-gdb 11.5.114~11.5.2-1 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 11.5.2-1 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.5.2-1 all NVIDIA CUDA and OpenCL documentation
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08 Driver Version: 510.73.08 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
Thanks, good to know! Besides the issue with the cuda toolkit version, is everything working? (i.e. the program starts normally and simulations can be opened?) I ask because there were some interoperability issues in the past (e.g. simulations saved on a Windows machine could originally not be loaded in Linux).
Yes, I did run most of examples/simulations/
. Dark Forest demo runs at ~60 steps per second, others faster. I've tried a few recent ones from the network browser, all good.
It was already working with the old cuda version (with the workaround). I did not run into the linker issue described by Jay-Jay-D, only the initial build-failure (warning).
60 tps seems a bit slow. Which graphics card do you use? How fast is it without rendering (can be toggled with ALT+I)?
GeForce GTX 1660 Ti; without rendering I get ~100 (+/- 5) time steps per second. I have also tested on Windows now (dual-boot): ~70 with UI and 100 (+/-5) without. Could be Gnome doing some extra UI steps, IIRC that's a thing.
Ok, I see. Rendering seems to consume a lot of time here. You can e.g. reduce the frame rate or resolution in the display settings if you want.
I installed all the required dependencies and followed the instructions on the Readme, and I had no problem, but it failed in the last step
cmake --build . --config Release -j8
.The error message is
POP! OS is based on Ubuntu Impish, the GPU is a GeForce GTX 1060 6GB with nvidia470 driver.
Please ask any other information you need for better understand the issue.
Thanks in advance.