SciML / DiffEqFlux.jl

Pre-built implicit layer architectures with O(1) backprop, GPUs, and stiff+non-stiff DE solvers, demonstrating scientific machine learning (SciML) and physics-informed machine learning methods
https://docs.sciml.ai/DiffEqFlux/stable
MIT License
865 stars 153 forks source link

Multiple shooting example doesn't work #564

Closed bkuwahara closed 3 years ago

bkuwahara commented 3 years ago

The tutorial code for multiple shooting (https://diffeqflux.sciml.ai/dev/examples/multiple_shooting/) doesn't work for me. First, the second line using DiffEqFlux: group_ranges gives an error: ERROR: UndefVarError: group_ranges not defined I have DiffEqFlux installed and it works fine in all other respects.

So, I deleted the line, as well as the plot_multiple_shoot function and callback function which rely on it and tried to run the code without any callback. Upon running the first training line, I get the error: MethodError: no method matching multiple_shoot(::Vector{Float32}, ::Matrix{Float32}, ::StepRangeLen{Float32, Float64, Float64}, ::ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, false, Vector{Float32}, ODEFunction{false, var"#9#10", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, ::typeof(loss_function), ::Tsit5, ::Int64; continuity_term=200)

I ran into this error when trying to implement multiple shooting in my own code and was hoping the tutorial could give me some insight into the problem. Any idea what is going on here?

ChrisRackauckas commented 3 years ago

@adrhill ?

adrhill commented 3 years ago

Hi @bkuwahara,

can you check which version of DiffEqFlux you are using by typing ]status DiffEqFlux? The cause of your issue might be an older version of DiffEqFlux as multiple shooting with group_ranges was just recently merged in v1.37.0.

bkuwahara commented 3 years ago

Thanks, that was the problem. I have another question though - when I try to apply the same thing to my own code, each of the groups always behaves exactly the same way. I attached a gif of the training process so you can see what I mean. Any idea what might be causing this? My ODE and hence my neural network are different, but otherwise I did the same procedure. I have to imagine something is off in my code to lead to this, but I'm not sure what it is. Any ideas what might be wrong? my_multiple_shooting

adrhill commented 3 years ago

That's interesting, it looks like all groups learn the same dynamics, even though they have different initial conditions. Maybe the state doesn't affect the output of your neural network? Could you share your network and your ODEProblem?

bkuwahara commented 3 years ago

I'm using an SIR model with vaccination. The real model is

function SIRx!(du, u, p, t) 
    δ, m, σ, κ, β, μ, γ = p
    S, I, x = u
    du[1] = μ*(1-x) - β*S*I - μ*S
    du[2] = β*S*I - (μ+γ)*I
    du[3] = κ*x*(1-x)*(-σ*m+I+δ*(2*x-1))
    nothing
end

The parameters are p_true = Float32.([0.55e-3, 0.5e-3, 1, 1.69, 280, 1/50, 365/22]), initial conditions u0=[0.058496855, 0.00020796247, 0.7943508] saved at tsteps = range(0, 20, step=0.5) The neural network is defined using

ann_ude=FastChain(FastDense(3, 20, tanh),  FastDense(20, 1, tanh), (x,p)->x[1])
p0 = DiffEqFlux.initial_params(ann_ude)

It's meant to learn the third equation governing the change in x. The setup is

function SIRx_ude!(du, u, p, t, p_)
    μ, β, γ, κ, ω, δ = p_
    S, I, x = u
    du[1] = dS = μ*(1-x) - β*S*I - μ*S
    du[2] = dI = β*S*I - (μ+γ)*I
    du[3] = ann_ude(u, p)
    nothing
end;

ude(du, u, p, t) = SIRx_ude!(du, u, p, t, p_true);
prob = ODEProblem(ude, u0, (tsteps[1], tsteps[end]), p0);
adrhill commented 3 years ago

Running your ODE, I see that the data you're trying to fit is within a very small interval:

julia> maximum(ode_data; dims=2)
3×1 Matrix{Float64}:
 0.06015703426605154
 0.00022351952775341233
 0.7943534717796061

julia> minimum(ode_data; dims=2)
3×1 Matrix{Float64}:
 0.058369736482097166
 0.00013155024652744172
 0.7943237417350401

I think this is the reason why the dynamics you obtain from your UDE look very similar in all groups. Normalizing the inputs in ann_ude might help, e.g. something like:

umin = minimum(ode_data; dims=2)
umax = maximum(ode_data; dims=2)
normalize(u) = (u - umin) / (umax - umin)

ann_ude = FastChain((x, p) -> normalize(x), FastDense(3, 20, tanh), FastDense(20, 1))

(Maybe @ChrisRackauckas knows a more elegant way to do this...)

Normalizing your loss functions (the regular one and the kwarg continuity_loss in multiple_shoot) can also help.

Let me know if this fixes your problem!

bkuwahara commented 3 years ago

Much appreciated! Just normalizing the inputs to the NN makes the groups distinguishable, although they're still similar enough that the training doesn't really work. I tried normalizing the loss function (the regular one) and I get an error Mutating arrays is not supported. I've tried a number of different approaches, none of which actually involve mutating arrays as far as I can tell, so I'm not sure where this error comes from. Here's one of the versions I wrote which gave the error:

function loss(data, pred)
    d = similar(data)
        p = similar(pred)
    for i in 1:size(d, 1)
        d[i,:] = (data[i,:] .- umin[i]) ./ (umax[i] - umin[i])
        p[i,:] = (pred[i,:] .- umin[i]) ./ (umax[i] - umin[i])
    end
    return sum(abs2, d - p)
end

Is there a way to go about this that won't give me an error?

EDIT: I figured out how to do it. It seems like I had my eta value set much too high for the ADAM optimizer. Using a smaller value of eta and a larger group size makes a considerable difference. It's still not perfect, but I think I can get it to work from here.

adrhill commented 3 years ago

@bkuwahara Can this issue be closed?