JuliaArrays / ShiftedArrays.jl

Lazy shifted arrays for data analysis in Julia
Other
51 stars 10 forks source link

CUDA support #64

Open roflmaostc opened 1 year ago

roflmaostc commented 1 year ago

Hi,

how can we add CUDA support for this?

julia> ShiftedArrays.fftshift(CUDA.rand(2,2))
2×2 CircShiftedArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}:
┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f8167e4e6e0.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:90
 0.332543  0.906493
 0.51592   0.40837

Best,

Felix

piever commented 1 year ago

I think it's just a problem with displaying things. The actual fftshift method shouldn't call getindex. Might be worth to inquire over at CUDA.jl as to whether there's a way for array wrappers to avoid this warning when displaying the array.

RainerHeintzmann commented 1 year ago

Overloading the display can be done. However the real problem is that all broadcasting operations fall back on get_index() calls. This causes CUDA.jl, depending on the settings to either fail or be really slow. A sensible solution would be to implement broadcasting rules for ShiftedArrays. Ideally these rules would be able to broadcast between arrays of the same shifts, of different shifts and between other AbstractArrays. For a circshifted array one probably needs to calculate intersection points and split the arrays into seperate parts. What do you think? Who knows how to implement proper broadcasting rules?