[BUG]: Compilation error when both NVTX v2 and CUB are included

dkolsen-pgi commented 2 weeks ago

Is this a duplicate?

[x] I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct

Type of Bug

Compile-time Error

Component

CUB

Describe the bug

As of CUDA 12.6, CUB includes NVTX version 3 headers. If a source file includes an earlier version of NVTX and then includes a CUB header, then the NVTX version 3 header will #error out with "Trying to #include NVTX version 3 in a source file where an older NVTX version has already been included." The CUDA Toolkit contains headers for NVTX version 2, so it is quite easy for users to hit this compilation error. (See the steps to reproduce for a trivial example.)

How to Reproduce

$ cat nvtx.cu
#include <nvToolsExt.h>
#include <thrust/reduce.h>
int main() { }

$ /proj/cuda/12.6/Linux_x86_64/bin/nvcc nvtx.cu
In file included from /proj/cuda/12.6.0.560/Linux_x86_64/targets/x86_64-linux/include/cub/detail/nvtx3.hpp:635,
                 from /proj/cuda/12.6/Linux_x86_64/bin/../targets/x86_64-linux/include/cub/detail/nvtx.cuh:49,
                 from /proj/cuda/12.6/Linux_x86_64/bin/../targets/x86_64-linux/include/cub/device/device_reduce.cuh:46,
                 from /proj/cuda/12.6/Linux_x86_64/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/reduce.h:43,
                 from /proj/cuda/12.6/Linux_x86_64/bin/../targets/x86_64-linux/include/thrust/system/detail/adl/reduce.h:50,
                 from /proj/cuda/12.6/Linux_x86_64/bin/../targets/x86_64-linux/include/thrust/detail/reduce.inl:31,
                 from /proj/cuda/12.6/Linux_x86_64/bin/../targets/x86_64-linux/include/thrust/reduce.h:763,
                 from nvtx.cu:2:
/proj/cuda/12.6/Linux_x86_64/bin/../targets/x86_64-linux/include/nvtx3/nvToolsExt.h:179:2: error: #error "Trying to #include NVTX version 3 in a source file where an older NVTX version has already been included.  If you are not directly using NVTX (the NVIDIA Tools Extension library), you are getting this error because libraries you are using have included different versions of NVTX.  Suggested solutions are: (1) reorder #includes so the newest NVTX version is included first, (2) avoid using the conflicting libraries in the same .c/.cpp file, or (3) update the library using the older NVTX version to use the newer version instead."
  179 | #error "Trying to #include NVTX version 3 in a source file where an older NVTX version has already been included.  If you are not directly using NVTX (the NVIDIA Tools Extension library), you are getting this error because libraries you are using have included different versions of NVTX.  Suggested solutions are: (1) reorder #includes so the newest NVTX version is included first, (2) avoid using the conflicting libraries in the same .c/.cpp file, or (3) update the library using the older NVTX version to use the newer version instead."
      |  ^~~~~

Expected behavior

CUB should not fail with a compilation error if <nvToolsExt.h> has been included before CUB is included.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Fri_Jun_14_16:34:21_PDT_2024 Cuda compilation tools, release 12.6, V12.6.20 Build cuda_12.6.r12.6/compiler.34431801_0

jrhemstad commented 2 weeks ago

I don't really see what we can do above and beyond what we already provide. The nvToolsExt.h header is deprecated and slated to be removed from the CTK.

A person's options are:

Don't include nvToolsExt.h
Disable CCCL NVTX ranges with -DCCCL_DISABLE_NVTX

dkolsen-pgi commented 2 weeks ago

CUB can deal with this by adding something like this to the CUB header:

#if defined(NVTX_VERSION) && NVTX_VERSION < 3
#define CCCL_DISABLE_NVTX 1
#endif

bernhardmgruber commented 2 weeks ago

The situation is actually more complex. While we can workaround your example by disabling NVTX3 inside CUB:

#include <nvToolsExt.h>
// NVTX_VERSION is defined to 2
#include <thrust/reduce.h> // CUB sees NVTX_VERSION == 2 and disables NVTX3
int main() { }

We cannot workaround a swapped order of includes:

#include <thrust/reduce.h> // CUB sees no NVTX_VERSION and includes NVTX3
#include <nvToolsExt.h> // clashes with NVTX3 symbols
int main() { }

See: https://cuda.godbolt.org/z/PTq64P4dr

Given we can only provide half a workaround, I would rather not have it at all. The final order of includes can be unpredictable sometimes, so users would only be surprised that they can use NVTX2 in some TUs and not in others. Or a refactoring of their headers would suddently break, because NVTX2 and NVTX3 include order changed. That sounds very brittle. I think users should rather take note of mixing APIs deterministically and start to migrate away from NVTX2.

NVIDIA / cccl