Open sleeepyjack opened 1 month ago
Convince CCCL to expose cuda::atomic_ref::compareexchange* for 16B types
+1
Convince CCCL to expose cuda::atomic_ref::compareexchange* for 16B types
Discussion thread (NVIDIA internal): https://nvidia.slack.com/archives/CCP05T27R/p1721095033011529
Is your feature request related to a problem? Please describe.
The
packed_cas
update routine shows better performance compared toback_to_back_cas
andcas_dependent_write
.On sm_90 and higher we have hardware support for 16B atomic CAS which we currently don't make use of.
Describe the solution you'd like
16B
atomicCAS
was introduced with CUDA 12.3 (see docs).Idea: Add a dedicated codepath for sm_90+ by adding something like
Describe alternatives you've considered
Convince CCCL to expose
cuda::atomic_ref::compare_exchange_*
for 16B types ;)Additional context
No response