Closed emankov closed 1 week ago
The first type of errors is related to an expected template argument list:
/usr/local/cub-2.1.0/cub/block/specializations/../../block/../block/specializations/../../warp/specializations/../../thread/thread_store.cuh:337:56: error:
a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
337 | IterateThreadStore<0, VOLATILE_MULTIPLE>::template Dereference(
| ^
1 error generated when compiling for host.
The error occurs with LLVM >= 19
and all CUB versions. LLVM <= 18
doesn't throw such an error.
Fixed by #1535.
The second type of errors is related to the requested alignment, which must be 8192 bytes or smaller:
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.5.0/include\cuda/std\detail/libcxx/include\__utility\..\__type_traits/aligned_storage.h(134,1): error G9DC7A175: requested alignment must be 8192 bytes or smaller
134 | _CREATE_ALIGNED_STORAGE_SPECIALIZATION(0x4000);
| ^ ~~~~~~
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.5.0/include\cuda/std\detail/libcxx/include\__utility\..\__type_traits/aligned_storage.h:112:12: note: expanded from macro '_CREATE_ALIGNED_STORAGE_SPECIALIZATION'
112 | struct _ALIGNAS(n) type\
| ^ ~
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.5.0/include\cuda\std\detail\libcxx/include/__config:436:23: note: expanded from macro '_ALIGNAS'
436 | # define _ALIGNAS(x) alignas(x)
| ^ ~
1 error generated when compiling for host.
The error occurs with CUB > 1.9.8
and CUB shipped internally with CUDA.
Fixed by #1537.
The third type of errors is related to the explicit qualification required to use a member from qa dependent base class:
/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/../block/radix_rank_sort_operations.cuh:120:20: error: explicit qualification required to use member 'ProcessFloatMinusZero' from dependent base class
120 | return BFE(ProcessFloatMinusZero(key), bit_start, num_bits);
| ^
/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/block_radix_rank.cuh:409:50: note: in instantiation of member function 'cub::BFEDigitExtractor<double>::Digit' requested here
409 | unsigned int digit = digit_extractor.Digit(keys[ITEM]);
| ^
/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/block_radix_sort.cuh:261:72: note: in instantiation of function template specialization 'cub::BlockRadixRank<1024, 4, false>::RankKeys<unsigned long long, 4, cub::BFEDigitExtractor<double>>' requested here
261 | AscendingBlockRadixRank(temp_storage.asending_ranking_storage).RankKeys(
| ^
/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/block_radix_sort.cuh:343:13: note: in instantiation of member function 'cub::BlockRadixSort<double, 1024, 4>::RankKeys' requested here
343 | RankKeys(unsigned_keys, ranks, digit_extractor, is_descending);
| ^
/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/block_radix_sort.cuh:517:9: note: in instantiation of function template specialization 'cub::BlockRadixSort<double, 1024, 4>::SortBlocked<0, 1>' requested here
517 | SortBlocked(keys, values, begin_bit, end_bit, Int2Type<false>(), Int2Type<KEYS_ONLY>());
| ^
1 error generated when compiling for host.
The error occurs with CUB shipped internally with 11.4.0 <= CUDA < 11.6
.
Finally implemented
Problem Description
Currently, there are multiple build errors on CUB tests against https://github.com/NVIDIA/cub, so CUB tests are excluded from running for CUDA > 12.0. Additionally, it would be great to try again using the internal CUB shipped with CUDA.
Operating System
any
CPU
any
ROCm Version
No response
ROCm Component
HIPIFY
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response