Augmented neural ODE on GPU errors out

SebastianCallh commented 4 years ago

Trying to run the ANODE example in the docs on the GPU by moving data, labels, model, and parameters to GPU using the gpu function gives ArgumentError: cannot take the CPU address of a CuArrays.CuArray{Float32,2,Nothing} when passing data to the model.

Rewriting the code to use "standard" Flux constructs instead of FastChain, FastDense etc. Gives the error AugmentedNDELayer(::NeuralODE{Chain{Tuple{Dense{typeof(relu),Array{Float32,2},Array{Float32,1}},Dense{typeof(relu),Array{Float32,2},Array{Float32,1}},Dense{typeof(identity),Array{Float32,2},Array{Float32,1}}}},Array{Float32,1},Flux.var"#12#14"{Chain{Tuple{Dense{typeof(relu),Array{Float32,2},Array{Float32,1}},Dense{typeof(relu),Array{Float32,2},Array{Float32,1}},Dense{typeof(identity),Array{Float32,2},Array{Float32,1}}}}},Tuple{Float32,Float32},Tuple{Tsit5},Base.Iterators.Pairs{Symbol,Real,NTuple{4,Symbol},NamedTuple{(:save_everystep, :reltol, :abstol, :save_start),Tuple{Bool,Float64,Float64,Bool}}}}) when calling gpu on the model.

Is there a way to run an ANODE on the GPU? I could very well be missing something.

ChrisRackauckas commented 4 years ago

@avik-pal we probably are going to maybe start deprecating the fast layers, so it would be good to make sure it all works on standard Flux models as well

SebastianCallh commented 4 years ago

To be super clear: it does work with the standard Flux models if you run on CPU. It's when you try to move stuff to the GPU it breaks down.

avik-pal commented 4 years ago

@SebastianCallh Here is a quick solution using Chain and Dense Layers.

diffeqarray_to_array(x) = reshape(gpu(x), size(x)[1:2])

function construct_model(out_dim, input_dim, hidden_dim, augment_dim)
    input_dim = input_dim + augment_dim
    node = NeuralODE(Chain(Dense(input_dim, hidden_dim, relu),
                               Dense(hidden_dim, hidden_dim, relu),
                               Dense(hidden_dim, input_dim)) |> gpu,
                     (0.f0, 1.f0), Tsit5(), save_everystep = false,
                     reltol = 1e-3, abstol = 1e-3, save_start = false) |> gpu
    node = augment_dim == 0 ? node : AugmentedNDELayer(node, augment_dim)
    return Chain((x, p=node.p) -> node(x, p),
                 diffeqarray_to_array,
                 Dense(input_dim, out_dim)) |> gpu, node.p |> gpu
end

We need to move the neural network (inside the neural ODE) to GPU manually. Could you test and let me know if this solution works?

EDIT: added the diffeqarray_to_array function update

ChrisRackauckas commented 4 years ago

I think it could be worth it to update the example in the documentation to show the usage with GPUs, since that's probably the common use case here.

SebastianCallh commented 4 years ago

@avik-pal Unfortunately this does not work either. Did it work for you? Output of Pkg.installed() is

Dict{String,VersionNumber} with 18 entries:
  "CSV"                   => v"0.6.1"
  "CUDA"                  => v"0.1.0"
  "BSON"                  => v"0.2.6"
  "Distributions"         => v"0.23.4"
  "StatsPlots"            => v"0.14.6"
  "MLDatasets"            => v"0.5.2"
  "MLDataUtils"           => v"0.5.1"
  "Formatting"            => v"0.4.1"
  "CuArrays"              => v"2.2.2"
  "DiffEqFlux"            => v"1.16.0"
  "Plots"                 => v"1.4.3"
  "OrdinaryDiffEq"        => v"5.41.0"
  "Flux"                  => v"0.10.4"
  "IJulia"                => v"1.21.2"
  "ProgressMeter"         => v"1.3.1"
  "DifferentialEquations" => v"6.14.0"
  "DataFrames"            => v"0.20.2"
  "NNlib"                 => v"0.6.6"

The full error is

ArgumentError: cannot take the CPU address of a CuArrays.CuArray{Float32,2,Nothing}

Stacktrace:
 [1] unsafe_convert(::Type{Ptr{Float32}}, ::CuArrays.CuArray{Float32,2,Nothing}) at /home/julia-user/.julia/packages/CuArrays/YFdj7/src/array.jl:226
 [2] gemm!(::Char, ::Char, ::Float32, ::CuArrays.CuArray{Float32,2,Nothing}, ::Array{Float32,2}, ::Float32, ::Array{Float32,2}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/LinearAlgebra/src/blas.jl:1167
 [3] gemm_wrapper!(::Array{Float32,2}, ::Char, ::Char, ::CuArrays.CuArray{Float32,2,Nothing}, ::Array{Float32,2}, ::LinearAlgebra.MulAddMul{true,true,Bool,Bool}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/LinearAlgebra/src/matmul.jl:597
 [4] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/LinearAlgebra/src/matmul.jl:169 [inlined]
 [5] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/LinearAlgebra/src/matmul.jl:208 [inlined]
 [6] * at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/LinearAlgebra/src/matmul.jl:160 [inlined]
 [7] adjoint at /home/julia-user/.julia/packages/Zygote/1GXzF/src/lib/array.jl:310 [inlined]
 [8] _pullback at /home/julia-user/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
 [9] Dense at /home/julia-user/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:122 [inlined]
 [10] Dense at /home/julia-user/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:133 [inlined]
 [11] applychain at /home/julia-user/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:36 [inlined]
 [12] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{Dense{typeof(identity),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}}}, ::Array{Float32,2}) at /home/julia-user/.julia/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [13] applychain at /home/julia-user/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:36 [inlined]
 [14] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{typeof(diffeqarray_to_array),Dense{typeof(identity),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}}}, ::ODESolution{Float32,3,Array{CuArrays.CuArray{Float32,2,Nothing},1},Nothing,Nothing,Array{Float32,1},Nothing,ODEProblem{CuArrays.CuArray{Float32,2,Nothing},Tuple{Float32,Float32},false,CuArrays.CuArray{Float32,1,Nothing},ODEFunction{false,DiffEqFlux.var"#dudt_#99"{NeuralODE{Chain{Tuple{Dense{typeof(relu),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Dense{typeof(relu),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Dense{typeof(identity),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}}}},CuArrays.CuArray{Float32,1,Nothing},Flux.var"#12#14"{Chain{Tuple{Dense{typeof(relu),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Dense{typeof(relu),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Dense{typeof(identity),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}}}}},Tuple{Float32,Float32},Tuple{Tsit5},Base.Iterators.Pairs{Symbol,Real,NTuple{4,Symbol},NamedTuple{(:save_everystep, :reltol, :abstol, :save_start),Tuple{Bool,Float64,Float64,Bool}}}}},UniformScaling{Bool},Nothing,typeof(DiffEqFlux.basic_tgrad),Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem},Tsit5,DiffEqBase.SensitivityInterpolation{Array{Float32,1},Array{CuArrays.CuArray{Float32,2,Nothing},1}},DiffEqBase.DEStats}) at /home/julia-user/.julia/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [15] applychain at /home/julia-user/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:36 [inlined]
 [16] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{var"#649#650",typeof(diffeqarray_to_array),Dense{typeof(identity),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}}}, ::CuArrays.CuArray{Float32,2,Nothing}) at /home/julia-user/.julia/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [17] Chain at /home/julia-user/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:38 [inlined]
 [18] _pullback(::Zygote.Context, ::Chain{Tuple{var"#649#650",typeof(diffeqarray_to_array),Dense{typeof(identity),CuArrays.CuArray{Float32,2,Nothing},CuArrays.CuArray{Float32,1,Nothing}}}}, ::CuArrays.CuArray{Float32,2,Nothing}) at /home/julia-user/.julia/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [19] loss_node at ./In[321]:59 [inlined]
 [20] _pullback(::Zygote.Context, ::typeof(loss_node), ::Array{Float64,2}, ::Array{Float64,2}) at /home/julia-user/.julia/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [21] adjoint at /home/julia-user/.julia/packages/Zygote/1GXzF/src/lib/lib.jl:179 [inlined]
 [22] _pullback at /home/julia-user/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
 [23] #17 at /home/julia-user/.julia/packages/Flux/Fj3bt/src/optimise/train.jl:89 [inlined]
 [24] _pullback(::Zygote.Context, ::Flux.Optimise.var"#17#25"{typeof(loss_node),Tuple{Array{Float64,2},Array{Float64,2}}}) at /home/julia-user/.julia/packages/Zygote/1GXzF/src/compiler/interface2.jl:0
 [25] pullback(::Function, ::Zygote.Params) at /home/julia-user/.julia/packages/Zygote/1GXzF/src/compiler/interface.jl:172
 [26] gradient(::Function, ::Zygote.Params) at /home/julia-user/.julia/packages/Zygote/1GXzF/src/compiler/interface.jl:53
 [27] macro expansion at /home/julia-user/.julia/packages/Flux/Fj3bt/src/optimise/train.jl:88 [inlined]
 [28] macro expansion at /home/julia-user/.julia/packages/Juno/tLMZd/src/progress.jl:134 [inlined]
 [29] train!(::typeof(loss_node), ::Zygote.Params, ::DataLoader, ::ADAM; cb::var"#651#652") at /home/julia-user/.julia/packages/Flux/Fj3bt/src/optimise/train.jl:81
 [30] top-level scope at ./In[321]:80

avik-pal commented 4 years ago

@SebastianCallh My bad, I forgot to post the updated diffeqarray_to_array function. Please try the example in #333 it contains the full code.

SebastianCallh commented 4 years ago

Thank you, the example in #333 does indeed work, thank you!

Out out curiosity, I read the discussion on #288, and wonder why it is that

Or should we simply modify the NeuralODE type to store a parameter augment_dim, and dispatch based on that to use the current version if it is 0.

didn't get implemented? Are there technical challenges? It does sound like a very nice interface to me.

avik-pal commented 4 years ago

@SebastianCallh The main reason I decided not to go forward with it are

It is not really an extensible design. Right now we only support 0-th order augmentation, to support higher orders we would have to redefine the types again.
Also, the current implementation allows us to work across all layers without any modification to them. So in future if other DELayers are added then they should also work with AugmentedDELayer

SebastianCallh commented 4 years ago

Makes sense. Thank you for letting me know!

SciML / DiffEqFlux.jl

Augmented neural ODE on GPU errors out #329