RenderKit / oidn

Intel® Open Image Denoise library
https://www.openimagedenoise.org/
Apache License 2.0
1.73k stars 160 forks source link

Failing to build with CUDA 12 - error: more than one instance of overloaded function "max" matches the argument list #227

Open Delicates opened 6 days ago

Delicates commented 6 days ago

oidn-2.2.* and oidn-2.3.0 fail to build with CUDA 12:

-- The CXX compiler identification is GNU 14.1.1
-- The CUDA compiler identification is NVIDIA 12.3.107

...

[113/116] /usr/bin/cmake -E cmake_symlink_library libOpenImageDenoise_device_cpu.so.2.3.0 libOpenImageDenoise_device_cpu.so.2.3.0 libOpenImageDenoise_device_cpu.so && :
[114/116] cd oidn-2.3.0/work/oidn-2.3.0_build/devices/cuda/build && /usr/bin/cmake --build .
FAILED: devices/cuda/stamp/OpenImageDenoise_device_cuda-build oidn-2.3.0/work/oidn-2.3.0_build/devices/cuda/stamp/OpenImageDenoise_device_cuda-build 
cd oidn-2.3.0/work/oidn-2.3.0_build/devices/cuda/build && /usr/bin/cmake --build .
[1/12] Building CXX object CMakeFiles/OpenImageDenoise_device_cuda.dir/cuda_external_buffer.cpp.o
[2/12] Building CXX object CMakeFiles/OpenImageDenoise_device_cuda.dir/cuda_module.cpp.o
[3/12] Building CXX object CMakeFiles/curtn.dir/curtn.cpp.o
[4/12] Linking CXX static library oidn-2.3.0/work/oidn-2.3.0_build/libcurtn.a
[5/12] Building CXX object CMakeFiles/OpenImageDenoise_device_cuda.dir/cuda_device.cpp.o
[6/12] Building CUDA object CMakeFiles/OpenImageDenoise_device_cuda.dir/cuda_engine.cu.o
FAILED: CMakeFiles/OpenImageDenoise_device_cuda.dir/cuda_engine.cu.o 
/opt/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/x86_64-pc-linux-gnu/gcc-bin/12 -DOIDN_DEVICE_CUDA_API_DRIVER -DOpenImageDenoise_device_cuda_EXPORTS -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -Ioidn-2.3.0/work/oidn-2.3.0/devices/cuda/../../external/cutlass/include -Ioidn-2.3.0/work/oidn-2.3.0/devices/cuda/../../external/cutlass/tools/util/include -isystem /opt/cuda/targets/x86_64-linux/include -isystem oidn-2.3.0/work/oidn-2.3.0 -isystem oidn-2.3.0/work/oidn-2.3.0/external -isystem oidn-2.3.0/work/oidn-2.3.0_build -O2 -g -DNDEBUG -std=c++11 -Xcompiler=-fPIC -fvisibility=internal -fvisibility-inlines-hidden -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -MD -MT CMakeFiles/OpenImageDenoise_device_cuda.dir/cuda_engine.cu.o -MF CMakeFiles/OpenImageDenoise_device_cuda.dir/cuda_engine.cu.o.d -x cu -c oidn-2.3.0/work/oidn-2.3.0/devices/cuda/cuda_engine.cu -o CMakeFiles/OpenImageDenoise_device_cuda.dir/cuda_engine.cu.o
oidn-2.3.0/work/oidn-2.3.0/core/math.h(25): error: more than one instance of overloaded function "max" matches the argument list:
            function "max(int, int)" (declared at line 416 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.h)
            function "max(unsigned int, unsigned int)" (declared at line 993 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(int, unsigned int)" (declared at line 998 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(unsigned int, int)" (declared at line 1003 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(long, long)" (declared at line 1008 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(unsigned long, unsigned long)" (declared at line 1026 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(long, unsigned long)" (declared at line 1043 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(unsigned long, long)" (declared at line 1060 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(long long, long long)" (declared at line 1077 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(unsigned long long, unsigned long long)" (declared at line 1082 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(long long, unsigned long long)" (declared at line 1087 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(unsigned long long, long long)" (declared at line 1092 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            function "max(float, float)" (declared at line 1097 of /opt/cuda/targets/x86_64-linux/include/crt/math_functions.hpp)
            argument types are: (half, half)
    template<typename T> __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) T max(T a, T b) { return ::max(a, b); }
                                                                                                                                      ^
          detected during:
            instantiation of "T oidn::math::max(T, T) [with T=half]" at line 50 of oidn-2.3.0/work/oidn-2.3.0/devices/cuda/../gpu/gpu_pool.h
            instantiation of "void oidn::GPUPoolKernel<T, oidn::TensorLayout::hwc>::operator()(const oidn::WorkItem<3> &) const [with T=half]" at line 39 of oidn-2.3.0/work/oidn-2.3.0/devices/cuda/cuda_engine.h
            instantiation of "void oidn::<unnamed>::basicCUDAKernel(oidn::WorkDim<3>, Kernel) [with Kernel=oidn::GPUPoolKernel<half, oidn::TensorLayout::hwc>]" at line 105 of oidn-2.3.0/work/oidn-2.3.0/devices/cuda/cuda_engine.h
            instantiation of "void oidn::CUDAEngine::submitKernel(oidn::WorkDim<N>, const Kernel &) [with N=3, Kernel=oidn::GPUPoolKernel<half, oidn::TensorLayout::hwc>]" at line 73 of oidn-2.3.0/work/oidn-2.3.0/devices/cuda/../gpu/gpu_pool.h
            instantiation of "void oidn::GPUPool<EngineT, SrcDstT, srcDstLayout>::submit() [with EngineT=oidn::CUDAEngine, SrcDstT=half, srcDstLayout=oidn::TensorLayout::hwc]" at line 61 of oidn-2.3.0/work/oidn-2.3.0/devices/cuda/../gpu/gpu_pool.h
            implicit generation of "oidn::GPUPool<EngineT, SrcDstT, srcDstLayout>::~GPUPool() [with EngineT=oidn::CUDAEngine, SrcDstT=half, srcDstLayout=oidn::TensorLayout::hwc]" at line 61 of oidn-2.3.0/work/oidn-2.3.0/devices/cuda/../gpu/gpu_pool.h
            instantiation of class "oidn::GPUPool<EngineT, SrcDstT, srcDstLayout> [with EngineT=oidn::CUDAEngine, SrcDstT=half, srcDstLayout=oidn::TensorLayout::hwc]" at line 61 of oidn-2.3.0/work/oidn-2.3.0/devices/cuda/../gpu/gpu_pool.h
            instantiation of "oidn::GPUPool<EngineT, SrcDstT, srcDstLayout>::GPUPool(EngineT *, const oidn::PoolDesc &) [with EngineT=oidn::CUDAEngine, SrcDstT=half, srcDstLayout=oidn::TensorLayout::hwc]" at line 146 of oidn-2.3.0/work/oidn-2.3.0/core/ref.h
            instantiation of "oidn::Ref<T> oidn::makeRef<T,Args...>(Args &&...) [with T=oidn::GPUPool<oidn::CUDAEngine, half, oidn::TensorLayout::hwc>, Args=<oidn::CUDAEngine *, const oidn::PoolDesc &>]" at line 46 of oidn-2.3.0/work/oidn-2.3.0/devices/cuda/cuda_engine.cu
atafra commented 5 days ago

I cannot reproduce this issue with CUDA 12.5 on Ubuntu 22.04 (GCC 11.4.0) or 24.04 (GCC 13.2.0).

You're not using the latest CUDA version. Did you try to update to 12.5? Also, please note that not even the latest CUDA supports your GCC version (14.1.1). The latest officially supported GCC version is 13.2.

I would suggest to try to use a supported combination of CUDA and GCC.

Delicates commented 5 days ago

I tried GCC 12,13 and 14 - same error on all of them. The error is generated by nvcc, not by gcc, and the overloaded function is defined in the CUDA include. I can't upgrade CUDA to 12.5 yet due to another blocker.

atafra commented 5 days ago

I can't reproduce the issue with the same CUDA 12.3.107 version either, using a clean Docker image. The error message doesn't make much sense to me anyway. My best guess is that there might be an issue with your CUDA installation.

Delicates commented 5 days ago

Tried re-installing CUDA, didn't help.

My naive guess is that the compiler doesn't know which of these 2 include files to use for the max() function definition:

Is the issue that the build should be using one of these files, but something pulls in both? This is on Gentoo by the way.

cmake -C oidn-2.3.0/work/oidn-2.3.0_build/gentoo_common_config.cmake -G Ninja -DCMAKE_INSTALL_PREFIX=/usr -DOIDN_APPS=no -DOIDN_DEVICE_CPU=yes -DOIDN_DEVICE_CUDA=yes -DOIDN_DEVICE_HIP=no -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_TOOLCHAIN_FILE=oidn-2.3.0/work/oidn-2.3.0_build/gentoo_toolchain.cmake oidn-2.3.0/work/oidn-2.3.0
loading initial cache file oidn-2.3.0/work/oidn-2.3.0_build/gentoo_common_config.cmake
-- The C compiler identification is GNU 12.4.0
-- The CXX compiler identification is GNU 12.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/x86_64-pc-linux-gnu-gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/x86_64-pc-linux-gnu-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python: oidn-2.3.0/temp/python3.13/bin/python3 (found version "3.13.0") found components: Interpreter
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found Intel SPMD Compiler (ISPC): /usr/bin/ispc
-- <<< Gentoo configuration >>>
Build type      RelWithDebInfo
Install path    /usr
Compiler flags:
C               -w -march=native -O2 -pipe  -Wall -Wno-unknown-pragmas -Wno-strict-overflow -fPIC -Wformat -Wformat-security -Wmissing-field-initializers
C++             -w -march=native -O2 -pipe  -Wall -Wno-unknown-pragmas -Wno-strict-overflow -fPIC -Wformat -Wformat-security -Wmissing-field-initializers 
Linker flags:
Executable      -Wl,-O1,--as-needed -pie -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now
Module          -Wl,-O1,--as-needed
Shared          -Wl,-O1,--as-needed -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now

This is the only patch Gentoo applies: https://gitweb.gentoo.org/repo/gentoo.git/tree/media-libs/oidn/files/oidn-2.2.2-amdgpu-targets.patch

atafra commented 5 days ago

Gentoo may not apply any relevant patches but it does apply custom build options. Could you please try to build OIDN from the unmodified original source without using any CMake config/toolchain files?