ROCm / HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code
MIT License
440 stars 69 forks source link

[HIPIFY][CUB] Restore building and testing of CUB #1534

Closed emankov closed 1 week ago

emankov commented 2 weeks ago

Problem Description

Currently, there are multiple build errors on CUB tests against https://github.com/NVIDIA/cub, so CUB tests are excluded from running for CUDA > 12.0. Additionally, it would be great to try again using the internal CUB shipped with CUDA.

Operating System

any

CPU

any

ROCm Version

No response

ROCm Component

HIPIFY

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

emankov commented 2 weeks ago

The first type of errors is related to an expected template argument list:

/usr/local/cub-2.1.0/cub/block/specializations/../../block/../block/specializations/../../warp/specializations/../../thread/thread_store.cuh:337:56: error: 
a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
  337 |     IterateThreadStore<0, VOLATILE_MULTIPLE>::template Dereference(
      |                                                        ^
1 error generated when compiling for host.

The error occurs with LLVM >= 19 and all CUB versions. LLVM <= 18 doesn't throw such an error.

Fixed by #1535.

emankov commented 2 weeks ago

The second type of errors is related to the requested alignment, which must be 8192 bytes or smaller:

C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.5.0/include\cuda/std\detail/libcxx/include\__utility\..\__type_traits/aligned_storage.h(134,1): error G9DC7A175: requested alignment must be 8192 bytes or smaller
  134 | _CREATE_ALIGNED_STORAGE_SPECIALIZATION(0x4000);
      | ^                                      ~~~~~~
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.5.0/include\cuda/std\detail/libcxx/include\__utility\..\__type_traits/aligned_storage.h:112:12: note: expanded from macro '_CREATE_ALIGNED_STORAGE_SPECIALIZATION'
  112 |     struct _ALIGNAS(n) type\
      |            ^        ~
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.5.0/include\cuda\std\detail\libcxx/include/__config:436:23: note: expanded from macro '_ALIGNAS'
  436 | #  define _ALIGNAS(x) alignas(x)
      |                       ^       ~
1 error generated when compiling for host.

The error occurs with CUB > 1.9.8 and CUB shipped internally with CUDA.

Fixed by #1537.

emankov commented 2 weeks ago

The third type of errors is related to the explicit qualification required to use a member from qa dependent base class:

/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/../block/radix_rank_sort_operations.cuh:120:20: error: explicit qualification required to use member 'ProcessFloatMinusZero' from dependent base class
  120 |         return BFE(ProcessFloatMinusZero(key), bit_start, num_bits);
      |                    ^
/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/block_radix_rank.cuh:409:50: note: in instantiation of member function 'cub::BFEDigitExtractor<double>::Digit' requested here
  409 |             unsigned int digit = digit_extractor.Digit(keys[ITEM]);
      |                                                  ^
/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/block_radix_sort.cuh:261:72: note: in instantiation of function template specialization 'cub::BlockRadixRank<1024, 4, false>::RankKeys<unsigned long long, 4, cub::BFEDigitExtractor<double>>' requested here
  261 |         AscendingBlockRadixRank(temp_storage.asending_ranking_storage).RankKeys(
      |                                                                        ^
/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/block_radix_sort.cuh:343:13: note: in instantiation of member function 'cub::BlockRadixSort<double, 1024, 4>::RankKeys' requested here
  343 |             RankKeys(unsigned_keys, ranks, digit_extractor, is_descending);
      |             ^
/usr/local/cuda-11.5.2/include/cub/block/specializations/../../block/block_radix_sort.cuh:517:9: note: in instantiation of function template specialization 'cub::BlockRadixSort<double, 1024, 4>::SortBlocked<0, 1>' requested here
  517 |         SortBlocked(keys, values, begin_bit, end_bit, Int2Type<false>(), Int2Type<KEYS_ONLY>());
      |         ^
1 error generated when compiling for host.

The error occurs with CUB shipped internally with 11.4.0 <= CUDA < 11.6.

emankov commented 1 week ago

Finally implemented