JuliaGPU / KernelAbstractions.jl

Heterogeneous programming in Julia
MIT License
373 stars 66 forks source link

Atomic operations on complex numbers #497

Open nHackel opened 2 months ago

nHackel commented 2 months ago

As requested by @vchuravy, this is a copy of my slack question:

Hello, I'm running into an errors with KernelAbstractions.jl, atomic operations using Atomix.jl and complex numbers. Is it possible to somehow perform atomic operations on ComplexF32 ?

As MWE we can just take the atomic operations example from the documentation and create img as an array of ComplexF32:

using CUDA, KernelAbstractions, Atomix

img = zeros(ComplexF32, (50, 50));
img[10:20, 10:20] .= 1;
img[35:45, 35:45] .= 2;

function index_fun_fixed(arr; backend=get_backend(arr))
    out = similar(arr)
    fill!(out, 0)
    kernel! = my_kernel_fixed!(backend)
    kernel!(out, arr, ndrange=(size(arr, 1), size(arr, 2)))
    return out
end

@kernel function my_kernel_fixed!(out, arr)
    i, j = @index(Global, NTuple)
    for k in 1:size(out, 1)
        Atomix.@atomic out[k, i] += arr[i, j]
    end
end

index_fun_fixed(CuArray(img))
index_fun_fixed(img)

On a GPU I get the error:

out_fixed = Array(index_fun_fixed(CuArray(img)));
ERROR: a error was thrown during kernel execution on thread (65, 1, 1) in block (3, 1, 1).
Stacktrace not available, run Julia on debug level 2 for more details (by passing -g2 to the executable).

ERROR: KernelException: exception thrown during kernel execution on device NVIDIA GeForce GTX 1080 Ti

and on CPU I get:

out_fixed = Array(index_fun_fixed(img));
ERROR: TaskFailedException

    nested task error: MethodError: no method matching modify!(::Ptr{ComplexF32}, ::typeof(+), ::ComplexF32, ::UnsafeAtomics.Internal.LLVMOrdering{:seq_cst})

    Closest candidates are:
      modify!(::Ptr{T}, ::typeof(UnsafeAtomics.right), ::T, ::Any) where T
       @ UnsafeAtomics ~/.julia/packages/UnsafeAtomics/ugwrA/src/core.jl:197
      modify!(::Core.LLVMPtr, ::OP, ::Any, ::UnsafeAtomics.Ordering) where OP
       @ UnsafeAtomicsLLVM ~/.julia/packages/UnsafeAtomicsLLVM/tbohS/src/internal.jl:20
      modify!(::Any, ::Any, ::Any)
       @ UnsafeAtomics ~/.julia/packages/UnsafeAtomics/ugwrA/src/core.jl:4
      ...

    Stacktrace:
     [1] modify!
       @ ~/.julia/packages/Atomix/F9VIX/src/core.jl:33 [inlined]
     [2] macro expansion
       @ ./REPL[55]:4 [inlined]
     [3] cpu_my_kernel_fixed!
       @ ~/.julia/packages/KernelAbstractions/HAcqg/src/macros.jl:287 [inlined]
     [4] cpu_my_kernel_fixed!(__ctx__::KernelAbstractions.CompilerMetadata{…}, out::Matrix{…}, arr::Matrix{…})

Accessing the real and imag part individually like this:

@kernel function my_kernel_fixed!(out, arr::AbstractArray{<:Complex})
               i, j = @index(Global, NTuple)
               for k in 1:size(out, 1)
                       Atomix.@atomic out[k, i].re += arr[i, j].re
                       Atomix.@atomic out[k, i].im += arr[i, j].im
               end
       end

results in such an error:

ERROR: TaskFailedException

    nested task error: ConcurrencyViolationError("modifyfield!: non-atomic field cannot be written atomically")

A fairly hacky workaround is reinterpret, but I'm not sure that is safe to do:

function index_fun_reinterpret(arr::AbstractArray{<:Complex}; backend=get_backend(arr))
    out = similar(arr)
    fill!(out, 0)
    kernel! = my_kernel_reinterpret!(backend)
    kernel!(reinterpret(reshape, Float32, out), arr, ndrange=(size(arr, 1), size(arr, 2)))
    return out
end

@kernel function my_kernel_reinterpret!(out, arr::AbstractArray{<:Complex})
    i, j = @index(Global, NTuple)
    for k in 1:size(out, 2)
        Atomix.@atomic out[1, k, i] += arr[i, j].re
        Atomix.@atomic out[2, k, i] += arr[i, j].im
    end
end
vchuravy commented 2 months ago

On the issue of safety. This can lead to "torn" updates. E.g. one thread updating re one updating im. Since you are doing an accumulate that should be fine. We would need to support 16byte wide operations, but that would also turn your accumulate operation into a cmpswap loop.