SciML / DiffEqFlux.jl

Pre-built implicit layer architectures with O(1) backprop, GPUs, and stiff+non-stiff DE solvers, demonstrating scientific machine learning (SciML) and physics-informed machine learning methods
https://docs.sciml.ai/DiffEqFlux/stable
MIT License
872 stars 157 forks source link

Forward solve fails with LinearAlgebra.Adjoint initial values #85

Closed jessebett closed 5 years ago

jessebett commented 5 years ago

Here's a quick toy with 1D state and batch dimension.

using Flux
using DiffEqFlux
using MLDataUtils
using Distributions
using OrdinaryDiffEq

D = 1
N = 10000
BS = 100

train_x = rand(Uniform(0,5),(D,N))
train_y = train_x .^ 2
batches = batchview((train_x,train_y),BS)

nn = Chain(Dense(1,10,tanh),Dense(10,10,tanh),Dense(10,1))
tspan = (0.0f0,1.0f0)
model = x-> neural_ode(nn,x,tspan,Tsit5(),atol=1e-6,rtol=1e-6)[:,:,end]

I thought I could just create a single batch element by transposing a singleton array. However LinearAlgebra.Adjoint breaks the forward solve in neural_ode:

julia> x0 = rand(1,1)
1×1 Array{Float64,2}:
 0.49820519072539593

julia> size(x0)
(1, 1)

julia> model(x0)
Tracked 1×1 Array{Float64,2}:
 0.3671767104193268

julia> x1 = rand(1)'
1×1 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
 0.2720790231482093

julia> size(x1)
(1, 1)

julia> model(x1)
ERROR: MethodError: Cannot `convert` an object of type Array{Float32,2} to an object of type LinearAlgebra.Adjoint{Float64,Array{Float64,1}}
Closest candidates are:
  convert(::Type{LinearAlgebra.Adjoint{T,S}}, ::LinearAlgebra.Adjoint) where {T, S} at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/adjtrans.jl:186
  convert(::Type{T}, ::T) where T<:AbstractArray at abstractarray.jl:14
  convert(::Type{T}, ::LinearAlgebra.Factorization) 

If I collect first it's fine:

julia> x2 = rand(1)'|>collect
1×1 Array{Float64,2}:
 0.22926764437395208

julia> size(x2)
(1, 1)

julia> model(x2)
Tracked 1×1 Array{Float64,2}:
 0.07379717434344342

This is surprising because LinearAlgebra.Adjoint plays well with normal solves:

u0=randn(1,1)
tspan = (0.0,1.0)
prob = ODEProblem(f,u0,tspan)
sol = solve(prob,Tsit5(),reltol=1e-8,abstol=1e-8)[:,end]
# > 1×1 Array{Float64,2}:
 1.211417535470834

u0=randn(1)'
tspan = (0.0,1.0)
prob = ODEProblem(f,u0,tspan)
sol = solve(prob,Tsit5(),reltol=1e-8,abstol=1e-8)[:,end]
# > 1×1 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
 -1.1428152078709033
ChrisRackauckas commented 5 years ago

rand(1,1)' works though. Hmm....

ChrisRackauckas commented 5 years ago
x1 = rand(1)'
nn(x1)

gives a Matrix instead of an Adjoint, and so it's not the same type. DiffEq assumes that you're using the same array type.

nn = Chain(Dense(1,10,tanh),Dense(10,10,tanh),Dense(10,1),x->vec(x)')

works

jessebett commented 5 years ago

You're right, that is where the issue is. Thanks @ChrisRackauckas