Closed PointKernel closed 4 months ago
I think #502 unblocks this issue.
I think #502 unblocks this issue.
Right. But #502 is blocked by some CCCL issues.
If this moves forward, I can get rid of the walkaround in my work.
502 is merged now. I had to go through big trouble to use cuco in my library, I had to make a separate target and constrain the CUDA architecture list to Volta+ only while rest of the sources are compiled for all given archs.
If this moves forward, I can get rid of the walkaround in my work.
This work is on my radar but I'm not following the issue you brought up here. The current cuco should work fine on Pascal or newer GPUs.
502 is merged now. I had to go through big trouble to use cuco in my library, I had to make a separate target and constrain the CUDA architecture list to Volta+ only while rest of the sources are compiled for all given archs.
If this moves forward, I can get rid of the walkaround in my work.
This work is on my radar but I'm not following the issue you brought up here. The current cuco should work fine on Pascal or newer GPUs.
We compile our library for older GPUs as well such as sm_52, that is what I had meant, filtering these older architectures and creating a seperate target for the codes using cuco. https://github.com/dmlc/dgl/blob/713ffb5714dbd0bd92b26d6412f5040092179ee3/graphbolt/CMakeLists.txt#L61-L69
Thank you for your work!
Or can I already start using cuco when I am compiling for sm_52, sm_60, sm_70, sm_80 etc, especially including sm_52 since it does not have the required support. Previously, I was getting a compilation error because cuda/atomic
header was not importable when the minimum architecture is older than Pascal.
Note: It is ensured that cuco code path is not taken if running on an old GPU. This has to do with compilation flow.
Previously, I was getting a compilation error because cuda/atomic header was not importable when the minimum architecture is older than Pascal.
Yeah, I remember this but I'm not sure whether the new cuda/atomic
with CCCL upgrade can solve the issue. Will back to you.
@mfbalin FYI, the corresponding PR gets merged but I think you still need a separate target for old GPUs.
@mfbalin FYI, the corresponding PR gets merged but I think you still need a separate target for old GPUs.
Is it because https://github.com/NVIDIA/cccl/issues/1083 is still open?
@mfbalin FYI, the corresponding PR gets merged but I think you still need a separate target for old GPUs.
Is it because NVIDIA/cccl#1083 is still open?
Yes
Is your feature request related to a problem? Please describe.
The current cuco implementations use custom atomic functions, e.g. https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh#L904-L936 due to a performance regression with
cuda::atomic_ref
(https://github.com/NVIDIA/cccl/issues/1008). With the fix being merged into the main branch, we can get rid of those custom functions once CCCL 2.4 is fetched byrapids-cmake
Describe the solution you'd like
Replace https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh#L905 https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh#L947 https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/hyperloglog/hyperloglog_ref.cuh#L525 with corresponding
atomic_ref
operations.