ROCm / HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code
MIT License
444 stars 69 forks source link

[HIPIFY][Issue] Error loading cub #1568

Closed tkramer-motion closed 6 days ago

tkramer-motion commented 1 week ago

Problem Description

When running hipify-clang on a cu file that imports cub.cuh I get the following error. Happens with CUDA 11.7 and 11.8.

In file included from /usr/local/cuda-11.7/include/cub/cub.cuh:43:
/usr/local/cuda-11.7/include/cub/block/block_load.cuh:401:13: error: no matching 'operator new' function for non-allocating placement new expression; include <new>
            new(&items[ITEM]) InputT(block_itr[warp_offset + tid + (ITEM * CUB_PTX_WARP_THREADS)]);
            ^~~
/usr/local/cuda-11.7/include/cub/block/block_load.cuh:436:5: note: in instantiation of function template specialization 'cub::LoadDirectWarpStriped<unsigned int, 9, const unsigned int *>' requested here
    LoadDirectWarpStriped(linear_tid, block_itr, items, valid_items);
    ^
/usr/local/cuda-11.7/include/cub/block/block_load.cuh:992:13: note: in instantiation of function template specialization 'cub::LoadDirectWarpStriped<unsigned int, unsigned int, 9, const unsigned int *>' requested here
            LoadDirectWarpStriped(linear_tid, block_itr, items, valid_items, oob_default);
            ^
/usr/local/cuda-11.7/include/cub/block/block_load.cuh:1285:48: note: in instantiation of function template specialization 'cub::BlockLoad<unsigned int, 128, 9, cub::BLOCK_LOAD_WARP_TRANSPOSE>::LoadInternal<cub::BLOCK_LOAD_WARP_TRANSPOSE, 0>::Load<const unsigned int *, unsigned int>' requested here
        InternalLoad(temp_storage, linear_tid).Load(block_itr, items, valid_items, oob_default);
                                               ^
/usr/local/cuda-11.7/include/cub/device/dispatch/dispatch_radix_sort.cuh:313:43: note: in instantiation of function template specialization 'cub::BlockLoad<unsigned int, 128, 9, cub::BLOCK_LOAD_WARP_TRANSPOSE>::Load<const unsigned int *, unsigned int>' requested here
    BlockLoadKeys(temp_storage.load_keys).Load(d_keys_in, keys, num_items, default_key);
                                          ^
/usr/local/cuda-11.7/include/cub/device/dispatch/dispatch_radix_sort.cuh:1569:17: note: in instantiation of function template specialization 'cub::DeviceRadixSortSingleTileKernel<cub::DeviceRadixSortPolicy<unsigned int, unsigned int, int>::Policy800, false, unsigned int, unsigned int, int>' requested here
                DeviceRadixSortSingleTileKernel<MaxPolicyT, IS_DESCENDING, KeyT, ValueT, OffsetT>);
                ^
/usr/local/cuda-11.7/include/cub/block/../iterator/../util_device.cuh:703:28: note: (skipping 5 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
        return op.template Invoke<PolicyT>();
                           ^
/usr/local/cuda-11.7/include/cub/block/../iterator/../util_device.cuh:686:32: note: in instantiation of function template specialization 'cub::ChainedPolicy<620, cub::DeviceRadixSortPolicy<unsigned int, unsigned int, int>::Policy620, cub::DeviceRadixSortPolicy<unsigned int, unsigned int, int>::Policy610>::Invoke<cub::DispatchRadixSort<false, unsigned int, unsigned int, int>>' requested here
           return PrevPolicyT::Invoke(ptx_version, op);
                               ^
/usr/local/cuda-11.7/include/cub/block/../iterator/../util_device.cuh:686:32: note: in instantiation of function template specialization 'cub::ChainedPolicy<700, cub::DeviceRadixSortPolicy<unsigned int, unsigned int, int>::Policy700, cub::DeviceRadixSortPolicy<unsigned int, unsigned int, int>::Policy620>::Invoke<cub::DispatchRadixSort<false, unsigned int, unsigned int, int>>' requested here
/usr/local/cuda-11.7/include/cub/device/dispatch/dispatch_radix_sort.cuh:1615:46: note: in instantiation of function template specialization 'cub::ChainedPolicy<800, cub::DeviceRadixSortPolicy<unsigned int, unsigned int, int>::Policy800, cub::DeviceRadixSortPolicy<unsigned int, unsigned int, int>::Policy700>::Invoke<cub::DispatchRadixSort<false, unsigned int, unsigned int, int>>' requested here
            if (CubDebug(error = MaxPolicyT::Invoke(ptx_version, dispatch))) break;
                                             ^
/usr/local/cuda-11.7/include/cub/device/device_radix_sort.cuh:213:65: note: in instantiation of member function 'cub::DispatchRadixSort<false, unsigned int, unsigned int, int>::Dispatch' requested here
        return DispatchRadixSort<false, KeyT, ValueT, OffsetT>::Dispatch(
                                                                ^
/tmp/hilbert_sort.cu-362ebf.hip:37:37: note: in instantiation of function template specialization 'cub::DeviceRadixSort::SortPairs<unsigned int, unsigned int>' requested here
    gpuErrchk(cub::DeviceRadixSort::SortPairs

Operating System

Ubuntu 22.04

CPU

AMD EPYC 7R32

ROCm Version

ROCm 5.7.1

ROCm Component

HIPIFY

Steps to Reproduce

Running hipify-clang on a cu file that imports cub.cuh

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

tkramer-motion commented 6 days ago

Seems fixed in 6.1.2

emankov commented 6 days ago

Seems fixed in 6.1.2

Yes, fixed in the newer version of Clang shipped with ROCm HIP 6.x.