Open albertomercurio opened 1 year ago
CUDA.atomic_add!(CUDA.pointer(b, 1), a[tid])
The atomic_add!
family of functions directly maps onto hardware features, as the docstring mentions. So it is expected that Complex
etc are not supported.
@atomic
could support it if the logic in https://github.com/JuliaGPU/CUDA.jl/blob/bb37b50006295833d5396d1c7b330eec55b408e4/src/device/intrinsics/atomics.jl#L204-L208 were extended (https://github.com/JuliaLang/julia/pull/47116 probably can help to make that logic simpler), but you would still be limited to the maximal width of atomic operations that your hardware supports. That means ComplexF64 currently can not be supported by CUDA.@atomic
.
It may be possible to make our generic fallback handle data types that are wider than the atomics supported by the hardware, e.g. by taking a lock, but that would be both slow, tricky to implement (i.e. to avoid deadlocks on hardware without forward-progress guarantee), and would require a different API as we can't easily allocate a global memory lock automatically from kernel code.
What is the workaround for the moment? How to reduce a complex vector in a CUDA Kernel? I need this for a future implementation in a more complex kernel.
How to reduce a complex vector in a CUDA Kernel?
Our mapreduce kernel does not use atomic operations. You should write your reduction similarly so that it doesn't require atomic operations, or pass a lock-like variable to your kernel that you use to protect the 128-bit data.
The minimal working example to show my problem is the following
returns the following error
It doesn't work with both
CUDA.@atomic
andCUDA.atomic_add!
.