Closed azazellochg closed 2 years ago
Where is your /usr/include/thrust
from? Older versions of thrust is not compatible with CUDA 11.
It comes from libthrust-dev 1.14.0-1
The issue is resolved once the nvidia driver was updated to 470.82.00
Hello! I am having similar issue with the Relion installation.
This is the error I get.
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/cuda/include/thrust/system/cuda/config.h(107): error: "#" not expected here
/usr/local/cuda/include/thrust/system/cuda/config.h(107): error: expected a ";"
2 errors detected in the compilation of "/home/cryo-em/Desktop/software/em/relion-3.1.3/src/acc/cuda/cuda_projector_plan.cu".
CMake Error at relion_gpu_util_generated_cuda_projector_plan.cu.o.Release.cmake:280 (message):
Error generating file
/home/cryo-em/Desktop/software/em/relion-3.1.3/src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/./relion_gpu_util_generated_cuda_projector_plan.cu.o
make[2]: *** [src/apps/CMakeFiles/relion_gpu_util.dir/build.make:114: src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/relion_gpu_util_generated_cuda_projector_plan.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:612: src/apps/CMakeFiles/relion_gpu_util.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
Traceback (most recent call last):
File "/home/cryo-em/miniconda3/envs/scipion3/lib/python3.8/site-packages/scipion/__main__.py", line 474, in <module>
main()
File "/home/cryo-em/miniconda3/envs/scipion3/lib/python3.8/site-packages/scipion/__main__.py", line 297, in main
installPluginMethods()
File "/home/cryo-em/miniconda3/envs/scipion3/lib/python3.8/site-packages/scipion/install/install_plugin.py", line 259, in installPluginMethods
pinfo.installBin({'args': [binTarget, '-j', numberProcessor]})
File "/home/cryo-em/miniconda3/envs/scipion3/lib/python3.8/site-packages/scipion/install/plugin_funcs.py", line 166, in installBin
environment.execute()
File "/home/cryo-em/miniconda3/envs/scipion3/lib/python3.8/site-packages/scipion/install/funcs.py", line 748, in execute
self._executeTargets(targetList)
File "/home/cryo-em/miniconda3/envs/scipion3/lib/python3.8/site-packages/scipion/install/funcs.py", line 690, in _executeTargets
tgt.execute()
File "/home/cryo-em/miniconda3/envs/scipion3/lib/python3.8/site-packages/scipion/install/funcs.py", line 221, in execute
command.execute()
File "/home/cryo-em/miniconda3/envs/scipion3/lib/python3.8/site-packages/scipion/install/funcs.py", line 161, in execute
assert glob(t), ("target '%s' not built (after "
AssertionError: target '/home/cryo-em/Desktop/software/em/relion-3.1.3/bin/relion_refine' not built (after running 'make -j 1')
Error at main: target '/home/cryo-em/Desktop/software/em/relion-3.1.3/bin/relion_refine' not built (after running 'make -j 1')
Can anyone help me?
First, please check your CUDA SDK and driver versions.
Next, please try compiling via cmake
and make
. Compilation via a wrapper adds another layer of complexity. It hides exactly what is happening. Only after you succeed in building yourself, try wrappers.
Nvcc --version
give a weird output (don't understand why) --> /usr/lib/cuda/bin/nvcc: 3: exec: /usr/lib/nvidia-cuda-toolkit/bin/nvcc: not found
But nvidia-smi
prints out this information:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01 Driver Version: 510.39.01 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:17:00.0 Off | N/A |
| 37% 48C P2 110W / 350W | 2462MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:65:00.0 On | N/A |
| 34% 44C P8 40W / 350W | 354MiB / 24576MiB | 6% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2565 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 3262 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 260163 C ...ryolo-1.7.6/bin/python3.6 2449MiB |
| 1 N/A N/A 2565 G /usr/lib/xorg/Xorg 53MiB |
| 1 N/A N/A 3262 G /usr/lib/xorg/Xorg 175MiB |
| 1 N/A N/A 3390 G /usr/bin/gnome-shell 61MiB |
| 1 N/A N/A 4728 G ...mviewer/tv_bin/TeamViewer 14MiB |
| 1 N/A N/A 10297 G ...AAAAAAAAA= --shared-files 30MiB |
+-----------------------------------------------------------------------------+
When I run cmake ..
I get some error This is what I get:
-- BUILD TYPE set to the default type: 'Release'
-- Setting fallback CUDA_ARCH=35
-- ALLOW_CTF_IN_SAGD enabled - This build of RELION allows modulation of particle images by a contrast transfer function inside stochastic average gradient descent, as specified in Claim 1 of patent US10,282,513B2
-- CUDA enabled - Building CUDA-accelerated version of RELION
-- Setting cpu precision to double
-- Setting accelerated code precision to single
-- Could NOT find CUDA (missing: CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (found version ".")
-- Using non-cuda compilation....
-- MPI_INCLUDE_PATH : /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi;/usr/lib/x86_64-linux-gnu/openmpi/include
-- MPI_LIBRARIES : /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
-- MPI_CXX_INCLUDE_PATH : /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi;/usr/lib/x86_64-linux-gnu/openmpi/include
-- MPI_CXX_LIBRARIES : /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
-- CMAKE_C_COMPILER : /usr/bin/cc
-- CMAKE_CXX_COMPILER : /usr/bin/c++
-- MPI_C_COMPILER : /usr/bin/mpicc
-- MPI_CXX_COMPILER : /usr/bin/mpicxx
-- CMAKE_CXX_COMPILER_ID : GNU
-- Could NOT find FLTK (missing: FLTK_LIBRARIES FLTK_INCLUDE_DIR FLTK_FLUID_EXECUTABLE)
-- No FLTK installation was found
-- --------------------------------------------------------
-- -------- NO EXISTING FLTK LIBRARIES WHERE FOUND. -------
-- -------------- FLTK WILL BE DOWNLOADED AND -------------
-- --------------- BUILT DURING COMPILE-TIME. -------------
-- --------------------------------------------------------
-- ---- A WORKING INTERNET CONNECTION WILL BE REQUIRED. ---
-- --------------------------------------------------------
-- no previous fltk found, the following paths are set for libs/headers TO BE built
-- FLTK_INCLUDE_DIR: /home/cryo-em/Desktop/software/em/relion/external/fltk/include
-- FLTK_LIBRARIES: /home/cryo-em/Desktop/software/em/relion/external/fltk/lib/libfltk.so
-- Found FFTW
-- FFTW_PATH: /usr/include
-- FFTW_INCLUDES: /usr/include
-- FFTW_LIBRARIES: /usr/lib/x86_64-linux-gnu/libfftw3f.so;/usr/lib/x86_64-linux-gnu/libfftw3.so
BUILD_SHARED_LIBS = OFF
-- Building static libs (larger build size and binaries)
Running apps/CMakeLists.txt...
-- CMAKE_BINARY_DIR:/home/cryo-em/Desktop/software/em/relion/build
-- Git commit ID: 72bbf0c06cea68f8992328703ee5ae5f3d1fc9b7
-- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Found OpenMP: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /home/cryo-em/Desktop/software/em/relion/build
If I move forward with make
these are the last lines I get:
checking for pkg-config... /usr/bin/pkg-config
Package xft was not found in the pkg-config search path.
Perhaps you should add the directory containing `xft.pc'
to the PKG_CONFIG_PATH environment variable
No package 'xft' found
Package freetype2 was not found in the pkg-config search path.
Perhaps you should add the directory containing `freetype2.pc'
to the PKG_CONFIG_PATH environment variable
No package 'freetype2' found
checking for freetype-config... no
configure: please install pkg-config or use 'configure --disable-xft'.
configure: error: Aborting.
make[2]: *** [CMakeFiles/OWN_FLTK.dir/build.make:110: OWN_FLTK-prefix/src/OWN_FLTK-stamp/OWN_FLTK-configure] Error 1
make[1]: *** [CMakeFiles/Makefile2:243: CMakeFiles/OWN_FLTK.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
Your CUDA installation is broken. Fix it first.
You also have to install xft. This is mentioned in the documentation https://relion.readthedocs.io/en/release-4.0/Installation.html.
Seems to be working!
For the CUDA installation being broken I ran: sudo apt update --fix-missing
But nvcc --version
still generated the same result (/usr/lib/cuda/bin/nvcc: 3: exec: /usr/lib/nvidia-cuda-toolkit/bin/nvcc: not found
)
For the xft missing I ran sudo apt-get install -y libxft-dev
which for some reason did not get installed when I ran the command reported in this documentation.
Hello, I'm having very similar issues. Also with CUDA version 11.6 and Nvidia driver 510.47.03.
the cmake command doesn't seem to give any issues:
-- BUILD TYPE set to the default type: 'Release'
-- Using provided CUDA_ARCH=61
-- ALLOW_CTF_IN_SAGD enabled - This build of RELION allows modulation of particle images by a contrast transfer function inside stochastic average gradient descent, as specified in Claim 1 of patent US10,282,513B2
-- CUDA enabled - Building CUDA-accelerated version of RELION
-- Setting cpu precision to double
-- Setting accelerated code precision to single
-- Using cuda wrapper to compile....
-- Cuda version is >= 7.5 and single-precision build, enable double usage warning.
-- MPI_INCLUDE_PATH : /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi;/usr/lib/x86_64-linux-gnu/openmpi/include
-- MPI_LIBRARIES : /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
-- MPI_CXX_INCLUDE_PATH : /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi;/usr/lib/x86_64-linux-gnu/openmpi/include
-- MPI_CXX_LIBRARIES : /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
-- CMAKE_C_COMPILER : /usr/bin/cc
-- CMAKE_CXX_COMPILER : /usr/bin/c++
-- MPI_C_COMPILER : /usr/bin/mpicc
-- MPI_CXX_COMPILER : /usr/bin/mpicxx
-- CMAKE_CXX_COMPILER_ID : GNU
-- Found previously built non-system FLTK libraries that will be used.
-- FLTK_INCLUDE_DIR: /home/alex/Documents/relion/external/fltk/include
-- FLTK_LIBRARIES: /home/alex/Documents/relion/external/fltk/lib/libfltk.so
-- Found FFTW
-- FFTW_PATH: /usr/include
-- FFTW_INCLUDES: /usr/include
-- FFTW_LIBRARIES: /usr/lib/x86_64-linux-gnu/libfftw3f.so;/usr/lib/x86_64-linux-gnu/libfftw3.so
BUILD_SHARED_LIBS = OFF
-- Building static libs (larger build size and binaries)
Running apps/CMakeLists.txt...
-- CMAKE_BINARY_DIR:/home/alex/Documents/relion/build
-- Git commit ID: 72bbf0c06cea68f8992328703ee5ae5f3d1fc9b7
-- Configuring done
-- Generating done
-- Build files have been written to: /home/alex/Documents/relion/build
But the make command gets stuck at the same point.
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/cuda-11.6/include/thrust/system/cuda/config.h(107): error: "#" not expected here
/usr/local/cuda-11.6/include/thrust/system/cuda/config.h(107): error: expected a ";"
2 errors detected in the compilation of "/home/alex/Documents/relion/src/acc/cuda/cuda_projector_plan.cu".
CMake Error at relion_gpu_util_generated_cuda_projector_plan.cu.o.Release.cmake:280 (message):
Error generating file
/home/alex/Documents/relion/build/src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/./relion_gpu_util_generated_cuda_projector_plan.cu.o
make[2]: *** [src/apps/CMakeFiles/relion_gpu_util.dir/build.make:679: src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/relion_gpu_util_generated_cuda_projector_plan.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:675: src/apps/CMakeFiles/relion_gpu_util.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
This is the output when I run it now. Some of the binaries are still compiled, but not all. Any ideas what could be going on?
The only thing I could find is https://github.com/NVIDIA/thrust/issues/979
Small update: Relion 4.0 compiled succesfully.
@Alexamk how did you fix that as i have the same problem on a manjaro linux installation with cuda 11.6.0 and nvidia driver 510.47.03
I didn't. I just decided to install 4.0 instead, which worked just fine.
It works just forgot to switch to ver4.0 branch.
Environment:
OS: Debian, kernel 5.16.0-3 OpenMPI 4.1.2 gcc 10.3.0 nvidia driver 470.103.01 nvcc cuda_11.4.r11.4/compiler.30521435_0 RELION version 4.0 ce2e9352da91ad4323a0ebbc00c6796e4b917324 GPU: RTX3090
Make fails with:
[ 2%] Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/relion_gpu_util_generated_cuda_projector_plan.cu.o
In file included from /usr/include/thrust/system/cuda/detail/execution_policy.h:35,
from /usr/include/thrust/iterator/detail/device_system_tag.h:23,
from /usr/include/thrust/iterator/detail/iterator_facade_category.h:22,
from /usr/include/thrust/iterator/iterator_facade.h:37,
from /home/gsharov/soft/relion-4.0/src/acc/cuda/cub/device/../iterator/arg_index_input_iterator.cuh:48,
from /home/gsharov/soft/relion-4.0/src/acc/cuda/cub/device/device_reduce.cuh:41,
from /home/gsharov/soft/relion-4.0/src/acc/cuda/cuda_utils_cub.cuh:18,
from /home/gsharov/soft/relion-4.0/src/acc/cuda/cuda_projector_plan.cu:10:
/usr/include/thrust/system/cuda/config.h:79:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
79 | #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
Is whatever relion has in src/acc/cuda/cub/ now incompatible with cub that's now included in CUDA11 toolkit?
As the message suggests, does cmake -DTHRUST_IGNORE_CUB_VERSION_CHECK
help?
We cannot drop support for CUDA < 11 yet, so we cannot remove src/acc/cuda/cub
. Probably we need to make #include
conditional on the CUDA version.
Which package put /usr/include/thrust
? Do you really need it there?
Usually CUDA SDK is installed in /opt/cuda-XX
or /usr/local/cuda-XX
and does not contaminate /usr/include
.
I successfully compiled RELION 4.0 in:
/home/software/packages/cuda-11.6
, thus we don't have thrust in /usr/include
)/usr/include/thrust/system/cuda/config.h belongs to libthrust-dev. libthrust-dev reverse depends on nvidia-cuda-dev, which is required by nvidia-cuda-toolkit. CUDA toolkit is installe via debian package manager, so nvcc goes to /usr/bin and libs to /usr/lib/x86_64-linux-gnu
cmake -DTHRUST_IGNORE_CUB_VERSION_CHECK does not help
Did you test THRUST_IGNORE_CUB_VERSION_CHECK
?
If I move away /usr/include/thrust, I get:
[ 2%] Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/cuda_kernels/relion_gpu_util_generated_helper.cu.o
In file included from /home/gsharov/soft/relion-4.0/src/acc/cuda/cub/device/device_reduce.cuh:41,
from /home/gsharov/soft/relion-4.0/src/acc/cuda/cuda_utils_cub.cuh:18,
from /home/gsharov/soft/relion-4.0/src/acc/cuda/cuda_projector_plan.cu:10:
/home/gsharov/soft/relion-4.0/src/acc/cuda/cub/device/../iterator/arg_index_input_iterator.cuh:44:10: fatal error: thrust/version.h: No such file or directory
44 | #include <thrust/version.h>
| ^~~~~~~~~~~~~~~~~~
compilation terminated.
CMake Error at relion_gpu_util_generated_cuda_projector_plan.cu.o.Release.cmake:220 (message):
Error generating
/home/gsharov/soft/relion-4.0/test/src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/./relion_gpu_util_generated_cuda_projector_plan.cu.o
make[2]: *** [src/apps/CMakeFiles/relion_gpu_util.dir/build.make:126: src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/relion_gpu_util_generated_cuda_projector_plan.cu.o] Error 1
Don't move it, but try THRUST_IGNORE_CUB_VERSION_CHECK
.
After another system update and cmake -DCUDA_ARCH=86 -DFORCE_OWN_FFTW=ON -DAMDFFTW=ON -DCC=gcc-10 -DTHRUST_IGNORE_CUB_VERSION_CHECK=1 .. make worked... I dont understand what has changed, nothing cuda-related was upgraded.
Let's hope it will keep compiling in the future, so I dont have to re-open this. Thanks for you help, @biochem-fan
I believe the original issue was solved by 554e0ed993e5ac8a3fee4be7c5cf64a62216a8c7.
I also think that in the long-term shipping a version of cub
while at the same time using thrust
that comes with CUDA and depends on cub
is bound to cause unsolvable problems.
A possible solution would be to do something like:
#if (__CUDACC_VER_MAJOR__ < 11) || (__CUDACC_VER_MAJOR__ == 11 && __CUDACC_VER_MINOR__ < 2)
// Only use builtin CUB for those CUDA versions that don't bundle it
#include "src/acc/cuda/cub/device/device_radix_sort.cuh"
#include "src/acc/cuda/cub/device/device_reduce.cuh"
#include "src/acc/cuda/cub/device/device_scan.cuh"
#include "src/acc/cuda/cub/device/device_select.cuh"
#else
#include <cub/device/device_radix_sort.cuh>
#include <cub/device/device_reduce.cuh>
#include <cub/device/device_scan.cuh>
#include <cub/device/device_select.cuh>
#endif
in src/acc/cuda/cuda_utils_cub.cuh
I have problem compiling relion4 with cuda 11
Environment:
Command: cmake -DCUDA_ARCH=86 -DFORCE_OWN_FFTW=ON -DAMDFFTW=ON .. && make -j 12 CMakeCache.txt Error: