NVIDIA / cccl

CUDA Core Compute Libraries
https://nvidia.github.io/cccl/
Other
1.14k stars 135 forks source link

[BUG]: Thrust can'bt be used in host code #1248

Open wrvsrx opened 8 months ago

wrvsrx commented 8 months ago

Is this a duplicate?

Type of Bug

Compile-time Error

Component

Thrust

Describe the bug

If thrust/reduce.h is inclued in host code, following cpp file can't be compiled using g++ 12.3.

#include <thrust/reduce.h>
auto main() -> int {}

According to git bisect, this bug is caused by 7395463758994ff95f2d583fb7d537fbbbf1c262. According to https://github.com/NVIDIA/cccl/blob/7395463758994ff95f2d583fb7d537fbbbf1c262/cub/cub/util_device.cuh#L34, cub/cub/util_device.cuh should only contain host function. However, this commit introduce device function and <cub/util_ptx.cuh> into it, which cause the bug.

How to Reproduce

  1. create a main.cpp with following contents:
#include "thrust/reduce.h"
auto main() -> int {}
  1. g++ -Ithrust -Ilibcudacxx/include -Icub main.cpp -o main
  2. Compilation fails with following message:
❯ g++ -Ithrust -Ilibcudacxx/include -Icub main.cpp -o main
In file included from cub/cub/util_device.cuh:52,
                 from thrust/thrust/system/cuda/detail/util.h:48,
                 from thrust/thrust/system/cuda/detail/internal/copy_cross_system.h:49,
                 from thrust/thrust/system/cuda/detail/copy.h:111,
                 from thrust/thrust/system/detail/adl/copy.h:50,
                 from thrust/thrust/detail/copy.inl:31,
                 from thrust/thrust/detail/copy.h:98,
                 from thrust/thrust/detail/allocator/copy_construct_range.inl:31,
                 from thrust/thrust/detail/allocator/copy_construct_range.h:53,
                 from thrust/thrust/detail/contiguous_storage.inl:31,
                 from thrust/thrust/detail/contiguous_storage.h:243,
                 from thrust/thrust/detail/temporary_array.h:47,
                 from thrust/thrust/system/detail/generic/scan_by_key.inl:35,
                 from thrust/thrust/system/detail/generic/scan_by_key.h:150,
                 from thrust/thrust/detail/scan.inl:32,
                 from thrust/thrust/scan.h:1664,
                 from thrust/thrust/system/detail/generic/reduce_by_key.inl:40,
                 from thrust/thrust/system/detail/generic/reduce_by_key.h:95,
                 from thrust/thrust/detail/reduce.inl:33,
                 from thrust/thrust/reduce.h:789,
                 from main.cpp:1:
cub/cub/util_ptx.cuh: In function ‘void cub::CUB_200300___CUDA_ARCH_LIST___NS::CTA_SYNC()’:
cub/cub/util_ptx.cuh:277:5: error: ‘__syncthreads’ was not declared in this scope
  277 |     __syncthreads();
      |     ^~~~~~~~~~~~~
cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::CTA_SYNC_AND(int)’:
cub/cub/util_ptx.cuh:286:12: error: ‘__syncthreads_and’ was not declared in this scope
  286 |     return __syncthreads_and(p);
      |            ^~~~~~~~~~~~~~~~~
cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::CTA_SYNC_OR(int)’:
cub/cub/util_ptx.cuh:295:12: error: ‘__syncthreads_or’ was not declared in this scope
  295 |     return __syncthreads_or(p);
      |            ^~~~~~~~~~~~~~~~
cub/cub/util_ptx.cuh: In function ‘void cub::CUB_200300___CUDA_ARCH_LIST___NS::WARP_SYNC(unsigned int)’:
cub/cub/util_ptx.cuh:304:5: error: ‘__syncwarp’ was not declared in this scope
  304 |     __syncwarp(member_mask);
      |     ^~~~~~~~~~
cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::WARP_ANY(int, unsigned int)’:
cub/cub/util_ptx.cuh:313:12: error: ‘__any_sync’ was not declared in this scope
  313 |     return __any_sync(member_mask, predicate);
      |            ^~~~~~~~~~
cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::WARP_ALL(int, unsigned int)’:
cub/cub/util_ptx.cuh:322:12: error: ‘__all_sync’ was not declared in this scope
  322 |     return __all_sync(member_mask, predicate);
      |            ^~~~~~~~~~
cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::WARP_BALLOT(int, unsigned int)’:
cub/cub/util_ptx.cuh:331:12: error: ‘__ballot_sync’ was not declared in this scope
  331 |     return __ballot_sync(member_mask, predicate);
      |            ^~~~~~~~~~~~~
cub/cub/util_ptx.cuh: In function ‘unsigned int cub::CUB_200300___CUDA_ARCH_LIST___NS::SHFL_IDX_SYNC(unsigned int, int, unsigned int)’:
cub/cub/util_ptx.cuh:374:12: error: ‘__shfl_sync’ was not declared in this scope
  374 |     return __shfl_sync(member_mask, word, src_lane);
      |            ^~~~~~~~~~~
cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::RowMajorTid(int, int, int)’:
cub/cub/util_ptx.cuh:421:39: error: ‘threadIdx’ was not declared in this scope
  421 |     return ((block_dim_z == 1) ? 0 : (threadIdx.z * block_dim_x * block_dim_y)) +
      |                                       ^~~~~~~~~
In file included from libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx.h:31,
                 from libcudacxx/include/cuda/discard_memory:16,
                 from cub/cub/util_device.cuh:57:
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/../__cuda/ptx/ptx_helper_functions.h: In function ‘uint32_t cuda::ptx::__4::__as_ptr_smem(const void*)’:
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/../__cuda/ptx/ptx_helper_functions.h:30:44: error: ‘__cvta_generic_to_shared’ was not declared in this scope
   30 |   return static_cast<_CUDA_VSTD::uint32_t>(__cvta_generic_to_shared(__ptr));
      |                                            ^~~~~~~~~~~~~~~~~~~~~~~~
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/../__cuda/ptx/ptx_helper_functions.h: In function ‘uint32_t cuda::ptx::__4::__as_ptr_remote_dsmem(const void*)’:
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/../__cuda/ptx/ptx_helper_functions.h:37:44: error: ‘__cvta_generic_to_shared’ was not declared in this scope
   37 |   return static_cast<_CUDA_VSTD::uint32_t>(__cvta_generic_to_shared(__ptr));
      |                                            ^~~~~~~~~~~~~~~~~~~~~~~~
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/../__cuda/ptx/ptx_helper_functions.h: In function ‘uint64_t cuda::ptx::__4::__as_ptr_gmem(const void*)’:
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/../__cuda/ptx/ptx_helper_functions.h:43:44: error: ‘__cvta_generic_to_global’ was not declared in this scope
   43 |   return static_cast<_CUDA_VSTD::uint64_t>(__cvta_generic_to_global(__ptr));
      |                                            ^~~~~~~~~~~~~~~~~~~~~~~~
cub/cub/util_device.cuh: In static member function ‘static typename std::conditional<cub::CUB_200300___CUDA_ARCH_LIST___NS::detail::vsmem_helper_impl<DefaultAgentPolicyT, DefaultAgentT, FallbackAgentPolicyT, FallbackAgentT>::uses_fallback_policy, FallbackAgentT, DefaultAgentT>::type::TempStorage& cub::CUB_200300___CUDA_ARCH_LIST___NS::detail::vsmem_helper_impl<DefaultAgentPolicyT, DefaultAgentT, FallbackAgentPolicyT, FallbackAgentT>::get_temp_storage(cub::CUB_200300___CUDA_ARCH_LIST___NS::NullType&, cub::CUB_200300___CUDA_ARCH_LIST___NS::detail::vsmem_t&)’:
cub/cub/util_device.cuh:175:63: error: ‘blockIdx’ was not declared in this scope
  175 |       static_cast<char*>(vsmem.gmem_ptr) + (vsmem_per_block * blockIdx.x));
      |                                                               ^~~~~~~~
cub/cub/util_device.cuh: In static member function ‘static bool cub::CUB_200300___CUDA_ARCH_LIST___NS::detail::vsmem_helper_impl<DefaultAgentPolicyT, DefaultAgentT, FallbackAgentPolicyT, FallbackAgentT>::discard_temp_storage(typename std::conditional<uses_fallback_policy, FallbackAgentT, DefaultAgentT>::type::TempStorage&)’:
cub/cub/util_device.cuh:204:39: error: ‘threadIdx’ was not declared in this scope
  204 |     const std::size_t linear_tid    = threadIdx.x;
      |                                       ^~~~~~~~~
cub/cub/util_device.cuh:205:51: error: ‘blockDim’ was not declared in this scope
  205 |     const std::size_t block_stride  = line_size * blockDim.x;

Expected behavior

It compiles succesfully.

Reproduction link

No response

Operating System

NixOS 24.05pre-git (Uakari) x86_64

nvidia-smi output

It's not related to nvidia driver.

NVCC version

It's not related to nvcc.

rserban commented 2 months ago

Any update/follow-up on this?

bernhardmgruber commented 2 months ago

Hi! We are sorry this causes problems on your end! Unfortunately, this is expected. See also this issue: #1374. Here is the relevant snippet: