Recurse into some arrays of arrays?

JuliaGPU / Adapt.jl

Other

86 stars 24 forks source link

I wonder whether adapt should treat an Array of Arrays as a container, like a Tuple, rather than storage to be converted. Convert the innermost Array, not the outermost:

julia> (rand(1,1), rand(1,1)) |> cu |> first
1×1 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.37755206

julia> [rand(1,1), rand(1,1)] |> cu  # should perhaps work as above
ERROR: CuArray only supports element types that are allocated inline.
Matrix{Float64} is a mutable type

julia> [1:2, 3:4] |> cu  # should not change
2-element CuArray{UnitRange{Int64}, 1, CUDA.Mem.DeviceBuffer}:
 1:2
 3:4

julia> [SA[1,2.], SA[3,4.]] |> cu
2-element CuArray{SVector{2, Float64}, 1, CUDA.Mem.DeviceBuffer}:
 [1.0, 2.0]
 [3.0, 4.0]

julia> Any[1:2, 3:4] |> cu
ERROR: CuArray only supports element types that are allocated inline.

I'm not exactly sure what the rule would be, perhaps something like isbitstype(eltype(x)) is enough?

This came up in https://github.com/JuliaGPU/CUDA.jl/pull/1769, where CuIterator at present produces a Vector{CuArray}.

FWIW, CUDA.jl intentionally does not do this because it would imply allocating a device array on every kernel launch, at least when generalizing this to nested CuArrays. Currently, we only allow passing CuArray, because we can directly take its pointer and wrap it in a GPU object (CuDeviceArray) without having to allocate a GPU array. If we'd support nested arrays, say CuArray{CuArray}, we'd first need to download all elements back to the CPU (expensive), convert the elements to CuDeviceArray (cheap), and then allocate a new array (expensive) to upload those elements to (expensive).

This could work fine with Vector{CuArray} because CPU allocations are much cheaper, so I guess we could support adapting those just like we adapt tuples. But the difference in behavior is not ideal.

JuliaGPU / Adapt.jl

Recurse into some arrays of arrays? #60