NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
BSD 3-Clause "New" or "Revised" License
1.67k stars 447 forks source link

`error: use of undeclared identifier 'ProcessFloatMinusZero'` when compiling with clang #334

Closed zingdle closed 3 years ago

zingdle commented 3 years ago

When compiling example/device/example_device_radix_sort.cu with

clang++-11 -DTHRUST_IGNORE_CUB_VERSION_CHECK -I../.. --cuda-gpu-arch=sm_61 -std=c++17 -x cuda -c example_device_radix_sort.cu

it errors:

clang: warning: Unknown CUDA version 11.0. Assuming the latest supported version 10.1 [-Wunknown-cuda-version]
In file included from example_device_radix_sort.cu:46:
In file included from ../../cub/device/device_radix_sort.cuh:40:
In file included from ../../cub/device/dispatch/dispatch_radix_sort.cuh:40:
In file included from ../../cub/device/dispatch/../../agent/agent_radix_sort_histogram.cuh:38:
../../cub/device/dispatch/../../agent/../block/../block/radix_rank_sort_operations.cuh:124:20: error: use of undeclared identifier 'ProcessFloatMinusZero'
        return BFE(ProcessFloatMinusZero(key), bit_start, num_bits);
                   ^
                   this->
../../cub/device/dispatch/../../agent/../block/block_radix_rank.cuh:413:50: note: in instantiation of member function 'cub::BFEDigitExtractor<float>::Digit' requested here
            unsigned int digit = digit_extractor.Digit(keys[ITEM]);
                                                 ^
../../cub/device/dispatch/../../block/block_radix_sort.cuh:265:72: note: in instantiation of function template specialization 'cub::BlockRadixRank<256, 6, false, true, cub::BLOCK_SCAN_WARP_SCANS, cudaSharedMemBankSizeFourByte, 1, 1, 610>::RankKeys<unsigned int, 19, cub::BFEDigitExtractor<float>>' requested here
        AscendingBlockRadixRank(temp_storage.asending_ranking_storage).RankKeys(
                                                                       ^
../../cub/device/dispatch/../../block/block_radix_sort.cuh:404:13: note: in instantiation of member function 'cub::BlockRadixSort<float, 256, 19, int, 6, true, cub::BLOCK_SCAN_WARP_SCANS, cudaSharedMemBankSizeFourByte, 1, 1, 610>::RankKeys' requested here
            RankKeys(unsigned_keys, ranks, digit_extractor, is_descending);
            ^
../../cub/device/dispatch/dispatch_radix_sort.cuh:334:40: note: in instantiation of function template specialization 'cub::BlockRadixSort<float, 256, 19, int, 6, true, cub::BLOCK_SCAN_WARP_SCANS, cudaSharedMemBankSizeFourByte, 1, 1, 610>::SortBlockedToStriped<0, 0>' requested here
    BlockRadixSortT(temp_storage.sort).SortBlockedToStriped(
                                       ^
../../cub/device/dispatch/dispatch_radix_sort.cuh:1573:17: note: in instantiation of function template specialization 'cub::DeviceRadixSortSingleTileKernel<cub::DeviceRadixSortPolicy<float, int, int>::Policy800, false, float, int, int>' requested here
                DeviceRadixSortSingleTileKernel<MaxPolicyT, IS_DESCENDING, KeyT, ValueT, OffsetT>);
                ^
../../cub/util_device.cuh:706:28: note: (skipping 5 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
        return op.template Invoke<PolicyT>();
                           ^
../../cub/util_device.cuh:689:32: note: in instantiation of function template specialization 'cub::ChainedPolicy<620, cub::DeviceRadixSortPolicy<float, int, int>::Policy620, cub::DeviceRadixSortPolicy<float, int, int>::Policy610>::Invoke<cub::DispatchRadixSort<false, float, int, int, cub::DeviceRadixSortPolicy<float, int, int>>>' requested here
           return PrevPolicyT::Invoke(ptx_version, op);
                               ^
../../cub/util_device.cuh:689:32: note: in instantiation of function template specialization 'cub::ChainedPolicy<700, cub::DeviceRadixSortPolicy<float, int, int>::Policy700, cub::DeviceRadixSortPolicy<float, int, int>::Policy620>::Invoke<cub::DispatchRadixSort<false, float, int, int, cub::DeviceRadixSortPolicy<float, int, int>>>' requested here
../../cub/device/dispatch/dispatch_radix_sort.cuh:1619:46: note: in instantiation of function template specialization 'cub::ChainedPolicy<800, cub::DeviceRadixSortPolicy<float, int, int>::Policy800, cub::DeviceRadixSortPolicy<float, int, int>::Policy700>::Invoke<cub::DispatchRadixSort<false, float, int, int, cub::DeviceRadixSortPolicy<float, int, int>>>' requested here
            if (CubDebug(error = MaxPolicyT::Invoke(ptx_version, dispatch))) break;
                                             ^
../../cub/device/device_radix_sort.cuh:321:65: note: in instantiation of member function 'cub::DispatchRadixSort<false, float, int, int, cub::DeviceRadixSortPolicy<float, int, int>>::Dispatch' requested here
        return DispatchRadixSort<false, KeyT, ValueT, OffsetT>::Dispatch(
                                                                ^
example_device_radix_sort.cu:190:35: note: in instantiation of function template specialization 'cub::DeviceRadixSort::SortPairs<float, int>' requested here
    CubDebugExit(DeviceRadixSort::SortPairs(d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items));
                                  ^
../../cub/device/dispatch/../../agent/../block/../block/radix_rank_sort_operations.cuh:98:52: note: must qualify identifier to find this declaration in dependent base class
    static __device__ __forceinline__ UnsignedBits ProcessFloatMinusZero(UnsignedBits key)
                                                   ^
../../cub/device/dispatch/../../agent/../block/../block/radix_rank_sort_operations.cuh:143:25: error: use of undeclared identifier 'ProcessFloatMinusZero'
        return uint32_t(ProcessFloatMinusZero(key) >> UnsignedBits(bit_start)) & mask;
                        ^
                        this->
../../cub/device/dispatch/../../agent/agent_radix_sort_histogram.cuh:194:43: note: in instantiation of member function 'cub::ShiftDigitExtractor<float>::Digit' requested here
                int bin = digit_extractor.Digit(keys[u]);
                                          ^
../../cub/device/dispatch/../../agent/agent_radix_sort_histogram.cuh:231:13: note: in instantiation of member function 'cub::AgentRadixSortHistogram<cub::AgentRadixSortHistogramPolicy<256, 8, 8, float, 8>, false, float, int>::AccumulateSharedHistograms' requested here
            AccumulateSharedHistograms(tile_offset, keys);
            ^
../../cub/device/dispatch/dispatch_radix_sort.cuh:544:11: note: in instantiation of member function 'cub::AgentRadixSortHistogram<cub::AgentRadixSortHistogramPolicy<256, 8, 8, float, 8>, false, float, int>::Process' requested here
    agent.Process();
          ^
../../cub/device/dispatch/dispatch_radix_sort.cuh:1325:37: note: in instantiation of function template specialization 'cub::DeviceRadixSortHistogramKernel<cub::DeviceRadixSortPolicy<float, int, int>::Policy800, false, float, int>' requested here
            auto histogram_kernel = DeviceRadixSortHistogramKernel<
                                    ^
../../cub/device/dispatch/dispatch_radix_sort.cuh:1557:16: note: in instantiation of function template specialization 'cub::DispatchRadixSort<false, float, int, int, cub::DeviceRadixSortPolicy<float, int, int>>::InvokeOnesweep<cub::DeviceRadixSortPolicy<float, int, int>::Policy600>' requested here
        return InvokeOnesweep<ActivePolicyT>();
               ^
../../cub/device/dispatch/dispatch_radix_sort.cuh:1578:20: note: (skipping 4 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
            return InvokeManyTiles<ActivePolicyT>(Int2Type<ActivePolicyT::ONESWEEP>());
                   ^
../../cub/util_device.cuh:689:32: note: in instantiation of function template specialization 'cub::ChainedPolicy<620, cub::DeviceRadixSortPolicy<float, int, int>::Policy620, cub::DeviceRadixSortPolicy<float, int, int>::Policy610>::Invoke<cub::DispatchRadixSort<false, float, int, int, cub::DeviceRadixSortPolicy<float, int, int>>>' requested here
           return PrevPolicyT::Invoke(ptx_version, op);
                               ^
../../cub/util_device.cuh:689:32: note: in instantiation of function template specialization 'cub::ChainedPolicy<700, cub::DeviceRadixSortPolicy<float, int, int>::Policy700, cub::DeviceRadixSortPolicy<float, int, int>::Policy620>::Invoke<cub::DispatchRadixSort<false, float, int, int, cub::DeviceRadixSortPolicy<float, int, int>>>' requested here
../../cub/device/dispatch/dispatch_radix_sort.cuh:1619:46: note: in instantiation of function template specialization 'cub::ChainedPolicy<800, cub::DeviceRadixSortPolicy<float, int, int>::Policy800, cub::DeviceRadixSortPolicy<float, int, int>::Policy700>::Invoke<cub::DispatchRadixSort<false, float, int, int, cub::DeviceRadixSortPolicy<float, int, int>>>' requested here
            if (CubDebug(error = MaxPolicyT::Invoke(ptx_version, dispatch))) break;
                                             ^
../../cub/device/device_radix_sort.cuh:321:65: note: in instantiation of member function 'cub::DispatchRadixSort<false, float, int, int, cub::DeviceRadixSortPolicy<float, int, int>>::Dispatch' requested here
        return DispatchRadixSort<false, KeyT, ValueT, OffsetT>::Dispatch(
                                                                ^
example_device_radix_sort.cu:190:35: note: in instantiation of function template specialization 'cub::DeviceRadixSort::SortPairs<float, int>' requested here
    CubDebugExit(DeviceRadixSort::SortPairs(d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items));
                                  ^
../../cub/device/dispatch/../../agent/../block/../block/radix_rank_sort_operations.cuh:98:52: note: must qualify identifier to find this declaration in dependent base class
    static __device__ __forceinline__ UnsignedBits ProcessFloatMinusZero(UnsignedBits key)
                                                   ^
2 errors generated when compiling for sm_61.

clang and gcc use two phase name lookup so

https://github.com/NVIDIA/cub/blob/1877cc09bb5292207fda082628c1eb59f76885e6/cub/block/radix_rank_sort_operations.cuh#L124

and

https://github.com/NVIDIA/cub/blob/1877cc09bb5292207fda082628c1eb59f76885e6/cub/block/radix_rank_sort_operations.cuh#L143

should use this->ProcessFloatMinusZero(key)

But somehow nvcc accpet the follwing code:

template <typename T>
struct Base {
  __device__ T foo(T x) { return x; }
};

template <typename T>
struct Derived : Base<T> {
  __device__ T bar(T x) {
    return foo(x) + 1;
  }
};
alliepiper commented 3 years ago

FYI, Clang CUDA support in CUB is maintained by the community, we don't provide any guarantees or have CI for this compiler.

That said, the proposed fix sounds good to me. Could you submit a PR for this?

zingdle commented 3 years ago

I've submitted a PR at https://github.com/NVIDIA/cub/pull/335 and PTAL.

benbarsdell commented 3 years ago

Unfortunately this also breaks NVRTC :(

zingdle commented 3 years ago

@benbarsdell Do you mean the patch breaks NVRTC? I have no experience with NVRTC but this-> is supported in NVCC. Does NVCC and NVRTC use different complie process?

benbarsdell commented 3 years ago

Sorry for the confusion, the patch works in NVRTC. I just wanted to note that it's not just clang that doesn't like the old code.