SciML / DiffEqFlux.jl

Pre-built implicit layer architectures with O(1) backprop, GPUs, and stiff+non-stiff DE solvers, demonstrating scientific machine learning (SciML) and physics-informed machine learning methods
https://docs.sciml.ai/DiffEqFlux/stable
MIT License
851 stars 154 forks source link

FastChain in loss #198

Closed mkalia94 closed 4 years ago

mkalia94 commented 4 years ago

Hi,

I was following the "Universal Differential Equations for Neural Optimal Control" example and wanted to include the neural network ann as an extra term in loss as follows:

using DiffEqFlux, Flux, Optim, OrdinaryDiffEq

u0 = Float32(1.1)
tspan = (0.0f0,25.0f0)

ann = FastChain(FastDense(2,16,tanh), FastDense(16,16,tanh), FastDense(16,1))
p1 = initial_params(ann)
p2 = Float32[0.5,-0.5]
p3 = [p1;p2]
θ = Float32[u0;p3]

function dudt_(du,u,p,t)
    x, y = u
    du[1] = ann(u,p[1:length(p1)])[1]
    du[2] = p[end-1]*y + p[end]*x
end
prob = ODEProblem(dudt_,u0,tspan,p3)
concrete_solve(prob,Tsit5(),[0f0,u0],p3,abstol=1e-8,reltol=1e-6)

function predict_adjoint(θ)
  Array(concrete_solve(prob,Tsit5(),[0f0,θ[1]],θ[2:end],saveat=0.0:1:25.0))
end
loss_adjoint(θ) = sum(abs2,predict_adjoint(θ)[2,:].-1) + sum(abs2,ann(predict_adjoint(θ),θ[2:end]))
l = loss_adjoint(θ)

cb = function (θ,l)
  println(l)
  #display(plot(solve(remake(prob,p=Flux.data(p3),u0=Flux.data(u0)),Tsit5(),saveat=0.1),ylim=(0,6)))
  return false
end

# Display the ODE with the current parameter values.
cb(θ,l)

loss1 = loss_adjoint(θ)
res = DiffEqFlux.sciml_train(loss_adjoint, θ, BFGS(initial_stepnorm=0.01), cb = cb)

However, this results in the following error:

ERROR: LoadError: ArgumentError: number of columns of each array must match (got (1, 26))

which I do not understand..any help would be really appreciated! Thanks for the great package :)

The packages currently in my module are:

  [c7e460c6] ArgParse v1.0.1
  [6e4b80f9] BenchmarkTools v0.4.3
  [3895d2a7] CUDAapi v3.1.0
  [c5f51814] CUDAdrv v6.0.0
  [be33ccc6] CUDAnative v2.10.2
  [5ae59095] Colors v0.9.6
  [3a865a2d] CuArrays v1.7.2
  [2b5f629d] DiffEqBase v6.16.0
  [aae7a2af] DiffEqFlux v1.3.1
  [0c46a032] DifferentialEquations v6.11.0
  [31c24e10] Distributions v0.21.12
  [7876af07] Example v0.5.3
  [5789e2e9] FileIO v1.2.2
  [587475ba] Flux v0.10.1
  [0c68f7d7] GPUArrays v2.0.1
  [4e3cecfd] ImageShow v0.2.3
  [a98d9a8b] Interpolations v0.12.8
  [429524aa] Optim v0.20.1
  [1dea7af3] OrdinaryDiffEq v5.28.1
  [91a5bcdd] Plots v0.29.1
  [c3572dad] Sundials v3.8.2
  [e88e6eb3] Zygote v0.4.7
mkalia94 commented 4 years ago

I have a scrappy solution, which is to replace the loss by:

function loss_adjoint(θ)
    p = predict_adjoint(θ)
    l = 0
    for i=1:length(p[1,:])
        l  = l + sum(abs2,ann(p[:,i],θ[2:end]))
    end

    l = l+ sum(abs2,predict_adjoint(θ)[2,:].-1)
    return l
end

Is there a better way to do this?

ChrisRackauckas commented 4 years ago

This is a duplicate of https://github.com/JuliaDiffEq/DiffEqFlux.jl/issues/192 . Essentially, FastChain can act on a Vector but it doesn't work so well on a Matrix, even though neural networks generally "can" work on a matrix. This definition of the backpass https://github.com/JuliaDiffEq/DiffEqFlux.jl/blob/master/src/fast_layers.jl#L43-L54 hasn't been matrix-proofed, but if someone has like 30 minutes to dig through it then it shouldn't be too difficult: I believe it just needs a sum(?,dims=2) on one of the pullback terms.

mkalia94 commented 4 years ago

Thanks, also the quick fix above doesn't work if ann maps to a higher dimension. Say

ann = FastChain(FastDense(2,16,tanh), FastDense(16,16,tanh), FastDense(16,2))

Then looping in the new loss with initialization l = [0;0] results in a 'mutating arrays not supported error'. Is there a way around it?

mkalia94 commented 4 years ago

The above issue can be fixed with

sum(abs2,hcat([ann(p[:,i],θ[2:end]) for i in 1:length(p[1,:])]...))

I don't have much experience with Julia so I won't be able to resolve the Fast Chain action on a Matrix, but this is a quick fix for now.