SciML / DiffEqFlux.jl

Pre-built implicit layer architectures with O(1) backprop, GPUs, and stiff+non-stiff DE solvers, demonstrating scientific machine learning (SciML) and physics-informed machine learning methods
https://docs.sciml.ai/DiffEqFlux/stable
MIT License
870 stars 157 forks source link

Optimizing over a neural network and a (time-independent) coefficient simultaneously #409

Closed seadra closed 4 years ago

seadra commented 4 years ago

Currently, we can either parametrize our ODE using the output from a neural network and optimize over its parameters θ. This seems to be the usual path. Alternatively, can also optimize over coefficients as in the Lotka-Volterra example.

Is it somehow possible to optimize over both a set of coefficient and a neural network simultaneously?

The reason I'm asking this is because I have a matrix ODE, X'(t) = i[f(t) c; c -f(t)] X(t) where c is a constant and f(t) is the output from neural network chain, and it turns out that a solution only exists when the final time is chosen is around some particular value. The problem is, it is very hard to find out the correct tspan by trial and error.

This issue can be solve if there is a way to allow the final time "breathe", by introducing a scaling factor s, such that the ODE becomes X'(t) = i s [f(t) c; c -f(t)] X(t), and somehow get the optimizer figure out its proper value.

Perhaps this is already possible (I'm sorry if I'm missing something), but if not, this would be a very helpful feature.

ChrisRackauckas commented 4 years ago

Just split the vector like is done with fitting the initial conditions. Let me know if you more of a pointer than that.

seadra commented 4 years ago

Hi Chris,

Thanks for the reply! It's great to know this is readily possibly, but yes, that was a bit terse. Is there an example of that? (I tried searching for fitting the initial conditions with diffeqflux but couldn't find anything)

seadra commented 4 years ago

Just for concreteness, this is a simplified toy version of my code. I'm trying to (1) either add an overall scaling factor s to the return value of f_nn and let the optimizer figure out its value (2) or somehow let the optimizer an appropriate value of T.

using DiffEqSensitivity, OrdinaryDiffEq, Zygote, LinearAlgebra, FiniteDiff, Test, DiffEqFlux, Optim

const T = 10.0; # <--- can the optimizer figure out an appropriate value of T?
const ω = π/T;

ann = FastChain(FastDense(1,32,tanh), FastDense(32,32,tanh), FastDense(32,1))
p = initial_params(ann);

function f_nn(u, p, t)
    a = ann([t],p)[1];
    A = [1.0 a; a -1.0];
    return -im*A*u; # <--- or alternatively, can the optimizer figure out a time-independent scaling factor here?
end

u0 = [Complex{Float64}(1) 0; 0 1];

tspan = (0.0, T)

prob_ode = ODEProblem(f_nn, u0, tspan, p);
sol_ode = solve(prob_ode, Tsit5());

utarget = [Complex{Float64}(0) im; im 0];

function predict_adjoint(p)
  return solve(prob_ode, Tsit5(), p=Complex{Float64}.(p), abstol=1e-12, reltol=1e-12)
end

function loss_adjoint(p)
    prediction = predict_adjoint(p)
    usol = last(prediction)
    loss = 1.0 - abs(tr(usol*utarget')/2)^2
    return loss
end

DiffEqFlux.sciml_train(loss_adjoint, p, ADAM(0.1), maxiters = 100)
ChrisRackauckas commented 4 years ago

https://diffeqflux.sciml.ai/dev/examples/feedback_control/ has an explicit example where there's a neural network and two parameters optimized simultaneously.