NVIDIA / cccl

CUDA Core Compute Libraries
Other
1.11k stars 129 forks source link

[BUG]: Regression: Using Thrust OpenMP and CUDA back ends at the same time #2098

Closed dkolsen-pgi closed 1 week ago

dkolsen-pgi commented 1 month ago

Is this a duplicate?

Type of Bug

Compile-time Error

Component

Thrust

Describe the bug

It used to be possible, at least through CCCL 2.3.0, to use the Thrust OpenMP and CUDA back ends in the same compilation, choosing which back end to use at each algorithm call by specifying different execution policies, thrust::omp::par and thrust::cuda::par respectively. This was done by defining

#define THRUST_HOST_SYSTEM   THRUST_HOST_SYSTEM_OMP
#define THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_CUDA

before including any Thrust headers.

This no longer works with the CCCL main branch. It results in compilation errors. See the reproduction steps below.

This Thrust feature is necessary to implement nvc++ -stdpar=gpu,multicore, which is a much requested feature that we are hoping to implement for the next NVC++ release.

How to Reproduce

Test file:

#define THRUST_HOST_SYSTEM   THRUST_HOST_SYSTEM_OMP
#define THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_CUDA
#include <thrust/execution_policy.h>
int main() { }

To reproduce with the NVC++ dev compiler, which uses the latest CCCL main branch.

$ nvc++ -stdpar thrust.cpp
"/home/dolsen/work/pgi/dev/nv/Linux_x86_64/mine/compilers/include-stdpar/thrust/execution_policy.h", line 58: error: namespace "thrust::THRUST_200600_86_NS::system::cpp::detail" has no member "par_t"
  using host_t = thrust::system::cpp::detail::par_t;
                                              ^

"/home/dolsen/work/pgi/dev/nv/Linux_x86_64/mine/compilers/include-stdpar/thrust/execution_policy.h", line 291: error: const variable "thrust::THRUST_200600_86_NS::host" requires an initializer
  static const detail::host_t host;
                                  ^

2 errors detected in the compilation of "thrust.cpp".

It can also be reproduced with NVCC (since many don't have easy access to the NVC++ dev compiler). The paths here are in the NVHPC environment, but it should be easy enough to change the paths to work in the CCCL dev environment.

$ /proj/cuda/12.5/Linux_x86_64/bin/nvcc -x cu -I/proj/cuda/cccl/main/thrust -I/proj/cuda/cccl/main/cub -I /proj/cuda/cccl/main/libcudacxx/include thrust.cpp
/proj/cuda/cccl/main/thrust/thrust/execution_policy.h(58): error: namespace "thrust::THRUST_200600_520_NS::system::cpp::detail" has no member "par_t"
  using host_t = thrust::system::cpp::detail::par_t;
                                              ^

/proj/cuda/cccl/main/thrust/thrust/execution_policy.h(291): error: const variable "thrust::THRUST_200600_520_NS::host" requires an initializer
  static const detail::host_t host;
                                  ^

2 errors detected in the compilation of "thrust.cpp".

Expected behavior

Everything should just work when using Thrust after:

#define THRUST_HOST_SYSTEM   THRUST_HOST_SYSTEM_OMP
#define THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_CUDA

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

jrhemstad commented 1 month ago

Confirmed reproducer on CCCL main and nvcc 12.4 https://godbolt.org/z/G1cs38xYb

jrhemstad commented 1 month ago

@alliepiper would you mind looking into this?

dkolsen-pgi commented 1 week ago

This is blocking my implementation of nvc++ -stdpar=gpu,multicore. I was hoping to check in the feature this week, but I can't check it in to the NVHPC dev branch until this issue is fixed in the CCCL main branch. And we can't release the feature, scheduled for HPC SDK 24.9 in late September, until the fix for this issue is in a CCCL release.

So this issue is urgent.

(My apologies for not speaking up about this sooner. I was on vacation for a couple weeks and forgot about this.)

alliepiper commented 1 week ago

Taking a look now.

The break bisects to 91b78d8f8.

alliepiper commented 1 week ago

@dkolsen-pgi Found the culprit: https://github.com/NVIDIA/cccl/commit/91b78d8f817428939b7ff68a149d124344bc5d12#diff-75486465010ffdbcdb7b6a6ab5fdd365d5b3fd9d8cd4a5725b1ebd74a55eabdc

Fix is here: https://github.com/NVIDIA/cccl/pull/2332

bernhardmgruber commented 1 week ago

If this is an important use case for nvc++, can we extend the CI to cover this scenario? And if it is already covered, do we know why the CI missed this issue? Never mind, just saw the issue: #2333

dkolsen-pgi commented 1 week ago

Even if CCCL had excellent CI coverage for NVC++, it wouldn't have caught this bug. This bug only affects a new feature that I am working on, which is not in production yet. The bug may have been caught if #2333 had been implemented, but it wouldn't have been caught by NVC++ coverage.

dkolsen-pgi commented 1 week ago

Thanks for the fix, Allison. Compilation gets farther along. But it runs into another error that looks like the same problem.

"/home/dolsen/work/pgi/dev/nv/Linux_x86_64/mine/compilers/include-stdpar/thrust/mr/host_memory_resource.h", line 36: error: namespace "thrust::THRUST_200600_86_NS::system::cpp" has no member "memory_resource"
  using host_memory_resource = thrust::system::cpp::memory_resource;
                                                    ^

That line in <thrust/mr/host_memory_resource.h> looks wrong and I assume should have the same fix.

I also found three other uses of thrust::system::cpp::detail::execution_policy in files that are not in thrust/system/cpp. I don't know if thrust::system::cpp::detail::execution_policy exists when the host system is not CPP, so I don't know if these are actual problems. The files in question are thrust/detail/overlapped_copy.h, thrust/system/omp/detail/execution_policy.h, and thrust/system/tbb/detail/execution_policy.h.

bernhardmgruber commented 1 week ago

I looked into the offending commit using git show 91b78d8 | grep -C 3 __THRUST_HOST_SYSTEM_NAMESPACE and I find two occurences where __THRUST_HOST_SYSTEM_NAMESPACE was erroneously replaced by cpp. One of them was fixes in #2332, the other one is proposed here: #2341

I also found three other uses of thrust::system::cpp::detail::execution_policy in files that are not in thrust/system/cpp. I don't know if thrust::system::cpp::detail::execution_policy exists when the host system is not CPP, so I don't know if these are actual problems. The files in question are thrust/detail/overlapped_copy.h, thrust/system/omp/detail/execution_policy.h, and thrust/system/tbb/detail/execution_policy.h.

I looked at those files and annotated the locations with git blame, but those locations were not touched in commit 91b78d8. Notice that the offending commit only rewrote typedef to using.