JuliaORNL / JACC.jl

CPU/GPU parallel performance portable layer in Julia via functions as arguments
MIT License
20 stars 9 forks source link

Need set of common `atomic_*` functions (and maybe `@atomic` macro) #55

Closed PhilipFackler closed 5 months ago

pedrovalerolara commented 5 months ago

This is a great point! Thank you, @PhilipFackler, for opening this issue. I have taken a look at how atomic operations are made in Julia. It looks like there is a common syntax (compatible with CUDA and AMDGPU.jl, no idea on OneAPI.jl) using "@atomic". Everything is based on Atomix.jl: https://github.com/JuliaConcurrent/Atomix.jl We can see some examples using "@atomic" in CUDA: https://github.com/JuliaGPU/CUDA.jl/blob/9988e30fee4aab07576e24fe630594d4c30a2f32/src/indexing.jl#L105 https://discourse.julialang.org/t/how-to-use-atomic-with-cuda/47761 and AMDGPU: https://amdgpu.juliagpu.org/stable/kernel_programming/#Atomics https://amdgpu.juliagpu.org/stable/kernel_programming/#AMDGPU.Compiler.hipfunction So, if the use of "@atomic" is compatible with all the backends, we should not make anything in JACC for this, just use it, when necessary. @PhilipFackler, do you have any good test in mind that we can use for this??

PhilipFackler commented 5 months ago

@pedrovalerolara I think a good starting point would be to have an array of one element (starting at 0), and then a parallel_for in which every thread does an atomic increment of the one element. Then check if the element is equal to the number of loop iterations. That would be basic enough to make sure the compiler is happy in all the backends.