Closed mcabbott closed 2 years ago
The problem is that Adapt.jl doesn't really promise to implement any adaptors, it just helps you to create them. However, because people seem to expect adapt(Array, ...)
to be available, I added a simple adaptor in https://github.com/JuliaGPU/Adapt.jl/blob/53a45fdf877331d672fe3b14a4d960831e5d856e/src/arrays.jl.
Meanwhile, adapt(CuArray, ...)
as implemented in CUDA.jl is more opinionated, and won't convert anything that's already GPU compatible, https://github.com/JuliaGPU/CUDA.jl/blob/ef3d6a8da5a23e0266c803085aabd20fea5c6155/src/array.jl#L318-L322.
In a way the asymmetry isn't intended, but the two implementations aren't really related either. From a user perspective, of course, they do look awfully similar. Do you have a change in mind?
Thanks for the details. I don't really have a change in mind, no. I can see how both make some sense. And that perhaps there isn't a universal answer to what "move this to the GPU" should mean, e.g. for a range, which adapt(CuArray, ...)
does not alter:
julia> cu(ones(2)) .+ (1:4)'
2×4 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
2.0 3.0 4.0 5.0
2.0 3.0 4.0 5.0
julia> cu(ones(3,3)) * (1:3)
ERROR: Scalar indexing is disallowed.
...
[4] generic_matvecmul!(C::Vector{Float32}, tA::Char, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::UnitRange{Int64}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
The context is wondering how much Flux's cpu
/ gpu
functions should diverge from these. For reasons like this *
example, I think its gpu
needs to be more eager about materialising CuArrays.
I wouldn't mind making adap(CuArray, ...)
materialize more, but that might be breaking. Why not build gpu
and cpu
on Adapt directly and do the conversions you want (and, e.g., keep them symmetric)? It's pretty trivial to do so, and both CUDA.jl and GPUArrays.jl have a couple of them, e.g. for displaying stuff: https://github.com/JuliaGPU/GPUArrays.jl/blob/bb9ca6d1f11e82a1d495cb9cf39cee9c215491e0/src/host/abstractarray.jl#L44-L49
Ah nice. Making a new struct you own is what I was about to suggest, as a way to exploit adapt_structure
with more control.
For now I did https://github.com/FluxML/Flux.jl/pull/1704 which is a bit more brute-force. Maybe ideally we'd teach adapt_structure
and Functors.jl to talk to each other more, since that also handles nested structures around arrays.
Sounds good. I'll close this then.
For now I did FluxML/Flux.jl#1704 which is a bit more brute-force. Maybe ideally we'd teach
adapt_structure
and Functors.jl to talk to each other more, since that also handles nested structures around arrays.
I've been wanting to make @functor
also generate adapt_structure
for some time now, but have yet to get around to it.
Is this asymmetry the intended behaviour?
Ranges behave the same way: