SciML / DataDrivenDiffEq.jl

Data driven modeling and automated discovery of dynamical systems for the SciML Scientific Machine Learning organization
https://docs.sciml.ai/DataDrivenDiffEq/stable/
MIT License
406 stars 57 forks source link

Understanding controls #447

Open zahachtah opened 1 year ago

zahachtah commented 1 year ago

Dear package authors! I have much fun playing with this tool (and diffeqflux), this will be a game changer in my field also (ecology). I was wondering if I could get some advice on how to deal with controls, or independent (or external) variables. My concrete example is data from a lake, measuring nutrients, algae and zooplankton. I also have measurements of temperature and influx of nutrients. The later two are external to the system. I think I have managed to set up a system to make the data driven problem identification. I am not sure though, How to use the resulting system and plugging the external variables back in. I did a MWE to generate similar but simpler data:

function T(t)
        10*(1+sin(t/200)*0.4+0.4)
    end
    function Inp(t)
        0.05+0.1*(sin(t/100)*0.4+0.4)
    end
    function system(u,p,t)
        N=u[1]
        A=u[2]
        Z=u[3]
        inp=Inp(t)

        TA=exp(-(T(t)-10.0).^2/200)
        TZ=exp(-(T(t)-15.0).^2/200)
        dN=inp-N*0.05-0.4*N/(0.2+N)*A*TA
        dA=0.4*N/(0.2+N)*A*TA-0.1*A/(0.2+A)*Z*TZ-0.05*A
        dZ=0.1*A/(0.2+A)*Z*TZ-0.05*Z
        return [dN;dA;dZ]
    end
    z0 = [0.1; 0.1;0.1]
    ztspan = (0.0, 500.0) # this gets really slow for endtime>50....
    zprob = ODEProblem(system, z0, ztspan)
    zsol = solve(zprob, Tsit5(), saveat = 0.1);

    X = zsol[:, :]  .* (1.0.+ 0.02.*randn(rng, size(zsol)));
    ts = zsol.t;
    controls=vcat(Inp.(ts)',T.(ts)')

    prob = ContinuousDataDrivenProblem(X, ts, GaussianKernel(),U = controls)

    #Three variables: N,A,Z and two controls: Inp and T
    @variables u[1:3] c[1:2]
    u = collect(u)
    c = collect(c)

    # just a start for basis exploration
    h = Num[polynomial_basis(u, 3); polynomial_basis(c, 3);exp(c[2])]

    basis = Basis(h, u, controls = c);
    println(basis) # hide

    sampler = DataProcessing(split = 0.8, shuffle = true, batchsize = 30, rng = rng)
    λs = exp10.(-10:0.1:0)
    opt = STLSQ(λs)
    res = solve(prob, basis, opt,
                options = DataDrivenCommonOptions(data_processing = sampler, digits = 1))
       sys = get_basis(res)

This gives me a system of equations. I am not entirely sure what the best way to get the parameters as an array is but I use:

P=ModelingToolkit.varmap_to_vars(get_parameter_map(get_basis(res)),parameters(get_basis(res)))

If I do eqns=sys(z0,P,0) I get

Screenshot 2023-01-23 at 16 47 11

My Question: How would I be able to get the controls, c1 and c2 into the equations as external variables so that I can run it as an ODE directly, assuming the controls are measured variables, controls and not the functions T(t) and Inc(t) at the top of the MWE.

I would want to run:

ODEProblem(eqns,z0,ztspan,P)

but driven by the variables in the matrix controls.

I hope this is reasonably clear as a question? Any recommendations for improvements are greatly appreciated!

Cheers, J

zahachtah commented 1 year ago

If someone has similar question, here is my current progress:

ok, so these are two problems:

1) is to make data into a function. I understand now that this is meant to be done with collocation, or interpolation.

Using the above MWE

collocate_data(controls,ts)

gives an error:

LinearAlgebra.SingularException(2) ldiv!(::Matrix{Float64}, ::LinearAlgebra.Diagonal{Float64, Vector{Float64}}, ::LinearAlgebra.Adjoint{Float64, Matrix{Float64}})

I can, however do interpolation, but only one at the time:

itp_method = InterpolationMethod(QuadraticSpline)
itp = itp_method(controls[1,:],ts)

which gives me a callable function

2) getting these functions back into the reconstructed system of equations:

this was solved here: https://github.com/SciML/DataDrivenDiffEq.jl/issues/432

A bit complicated the last step, but will report here if I get it to work.

AlCap23 commented 1 year ago

Hi!

Sorry for the delay, but here is what I would do:

using DataDrivenDiffEq
using OrdinaryDiffEq
using DataDrivenSparse
using Random 
using Plots
# Generate the data

function T(t)
    10*(1+sin(t/200)*0.4+0.4)
end

function Inp(t)
    0.05+0.1*(sin(t/100)*0.4+0.4)
end

function system(u,p,t)
    N=u[1]
    A=u[2]
    Z=u[3]
    inp=Inp(t)

    TA=exp(-(T(t)-10.0).^2/200)
    TZ=exp(-(T(t)-15.0).^2/200)
    dN=inp-N*0.05-0.4*N/(0.2+N)*A*TA
    dA=0.4*N/(0.2+N)*A*TA-0.1*A/(0.2+A)*Z*TZ-0.05*A
    dZ=0.1*A/(0.2+A)*Z*TZ-0.05*Z
    return [dN;dA;dZ]
end

rng = Random.default_rng()

z0 = [0.1; 0.1;0.1]
ztspan = (0.0, 500.0) # this gets really slow for endtime>50....
zprob = ODEProblem(system, z0, ztspan)
zsol = solve(zprob, Tsit5(), saveat = 0.1);

X = zsol[:, :]  .* (1.0.+ 0.002.*randn(rng, size(zsol)));
ts = zsol.t;
controls=vcat(Inp.(ts)',T.(ts)')

prob = ContinuousDataDrivenProblem(X, ts, GaussianKernel(),U = controls)

# I always look at the data
plot(prob)

#Three variables: N,A,Z and two controls: Inp and T
@variables t
@variables u(t)[1:3] c[1:2]
D = Differential(t)

u = collect(u)
c = collect(c)
#du = collect(D.(u))

# just a start for basis exploration
# Here 1 is duplicated if I do not remove it manually
h = Num[polynomial_basis(u, 3)[2:end]; polynomial_basis(c, 3);exp(c[2])]

basis = Basis(h, u, controls = c);
println(basis) # hide

sampler = DataProcessing(split = 0.8, shuffle = false, batchsize = 200, rng = rng)
λs = exp10.(-10:0.1:1)

opt = ADMM(λs)

res = solve(prob, basis, opt,
            options = DataDrivenCommonOptions(data_processing = sampler, digits = 5))

sys = get_basis(res)
println(sys)

# Generate a closure on the system 
f_recovered = let control_T = T, control_U = Inp
    (x, p, t) -> sys(x, p, t, [control_T.(t); control_U.(t)])
end

# Optimal parameters
p_opt = get_parameter_values(sys)

# Try to predict the system
prediction_prob = ODEProblem(f_recovered, z0, ztspan, p_opt)
prediction = solve(prediction_prob, Tsit5())
plot(prediction)
plot!(zsol)

You should think about using ImplicitOptimizer(STLSQ(...)) if you're aiming for full recovery. If this is an attempt to find a simpler model, never mind :)

AlCap23 commented 1 year ago

Note that the closure can use any stuff here. This is basically generating a similar expression to what you did above in the original system, but written as a chain of two functions.