Using DataDrivenSolution for forward modelling

MortezaBabazadehShareh commented 1 year ago

After we have solved a DEProblem, how can we use the DataDrivenSolution to produce values by means of various time ranges? Suppose that we modelled SIR data while we used 350 days of records, now how can we produce SIR numbers for 400 days?

AlCap23 commented 1 year ago

The solution is a callable struct with the signature

solution(u, p, t, [c])

where u are the states, p the parameters, t the time and c are control inputs / exogenous signals. So if you want to simulate, you could just pass the solution to the solver ( when no controls apply ) or embed it with a control function

function recovered_dynamics(u,p,t)
    c = control_function(u,p,t) # This needs to be defined by the user and should return a vector
    return solution(u,p,t,c)
end

A similar example can be found in the showcase of automatic model discovery

MortezaBabazadehShareh commented 1 year ago

Whenever I define a ContinuousDataDrivenProblem, I can use the DataDrivenSolution to recover the dynamic and simulate. But every time I define a DiscreteDataDrivenProblem and try to recover the dynamic, the simulation is wrong!

This link shows one sample of the problem, using simulated SIR data: https://github.com/MortezaBabazadehShareh/MortezaBabazadehShareh/blob/main/simulated_SIR.jl

ChrisRackauckas commented 1 year ago

Can you share the plots so we can see what the results are like?

ChrisRackauckas commented 1 year ago

What's the loss like? What's the result like in the fitted region on the fitted parameters?

MortezaBabazadehShareh commented 1 year ago

Plot

ChrisRackauckas commented 1 year ago

That looks pretty good. What model is found by SINDy? Are the parameters correct and all?

MortezaBabazadehShareh commented 1 year ago

This is the result, but I can not simulate or predict. The ODEproblem or DiscreteProblem are unstable. I use the following code that works properly for ContinuousDataDrivenProblems.

function ddsol!( u, p, t) return system(u, p, t) end

u0=data[:,1] estimation_prob = ODEProblem(ddsol!, u0, tspan, p=get_parameter_values(system)) estimate = solve(estimation_prob, Tsit5(), saveat = 1)

MortezaBabazadehShareh commented 1 year ago

Actually the found model is not correct, based on simulated SIR model.

Difference(t; dt=1.0, update=false)(x(t)) = p₁ + p₂x(t) + p₃(x(t)^2) + p₄y(t) + p₆z(t) + p₉(z(t)^2) + p₅x(t)y(t) + p₇x(t)z(t) + p₈y(t)z(t) Difference(t; dt=1.0, update=false)(y(t)) = p₁₀ + p₁₂(x(t)^2) + p₁₅(y(t)^2) + p₁₉(z(t)^2) + p₁₁x(t) + p₁₃y(t) + p₁₆z(t) + p₁₄x(t)y(t) + p₁₇x(t)z(t) + p₁₈y(t)z(t) Difference(t; dt=1.0, update=false)(z(t)) = p₂₀ + p₂₁x(t) + p₂₂(x(t)^2) + p₂₅(y(t)^2) + p₂₉(z(t)^2) + p₂₃y(t) + p₂₆z(t) + p₂₄x(t)y(t) + p₂₇x(t)z(t) + p₂₈y(t)*z(t)

ChrisRackauckas commented 1 year ago

that would be the cause then. Did you subsample the data or adjust the thresholds?

MortezaBabazadehShareh commented 1 year ago

Yes, I have changed the basis, the lambda, and in some cases the model found by Sindy was more reasonable. The point is, in every situation when the DataDrivenProblem was discrete, I could not use the ddsol. I have tested this with different datasets. I feel the problem is in the way that I am using ddsol.

AlCap23 commented 1 year ago

Sorry for being late to the party.

I've adjusted the code and get the following result ( without digging too deep in hyper parameter optimisation etc ):

States : x(t) y(t) z(t)
Parameters : 11
Independent variable: t
Equations
Differential(t)(x(t)) = p₁*y(t) + p₃*(y(t)^2) + p₂*x(t)*y(t) + p₄*y(t)*z(t)
Differential(t)(y(t)) = p₆*(y(t)^2) + p₅*x(t)*y(t) + p₇*y(t)*z(t)
Differential(t)(z(t)) = p₁₀*(y(t)^2) + p₈*y(t) + p₁₁*y(t)*z(t) + p₉*x(t)*y(t)

It is not the groundtruth, but only 11 / 30 possible terms. All of which have a very similar scale. Batching might help here.

using DataDrivenDiffEq
using DataDrivenSparse
using ModelingToolkit, OrdinaryDiffEq, LinearAlgebra, Plots

gr()

## create a test problem from SIR system
function SIR(u, p, t)
    S, I, R= u
    Ṡ=-0.03*S*I
    İ=0.03*S*I-0.01*I
    Ṙ=0.01*I
    return [Ṡ, İ, Ṙ]
end

u0=[0.9999; .0001; 0.0]
steps=1_000
tspan=(0, steps)
dt=1

######################################################################
###################### Continuous Form  ##############################
######################################################################

problem=ODEProblem(SIR, u0, tspan)
data_SIR=solve(problem, Tsit5(), saveat=dt)#, atol=1e-7, rtol=1e-8)

tt=collect(0:1:steps)
plot(tt,data_SIR[1,:], label="Suseptible")
plot!(tt,data_SIR[2,:], label="Infected")
plot!(tt,data_SIR[3,:], lael="Removed")

######################################################

### definition of the problem
# We use the underlying solution here directly
sir_problem=DataDrivenProblem(data_SIR)

### definition of polynomial basis

@variables t x(t) y(t) z(t)
u = [x;y;z]
basis = Basis(polynomial_basis(u, 2), u, iv = t)

### choosing STLSQ as an optimizer
### STLSQ is a sparsifying algorithm that cause the solve function to call its "Sindy" method ###

opt = STLSQ(exp10.(-10:0.1:2))

### solving the problem to extract the model

ddsol = solve(sir_problem, basis, opt,  options = DataDrivenCommonOptions(digits = 3, normalize = DataNormalization(ZScoreTransform)))

### final differential equations

println(get_basis(ddsol))

### coefficients in the above differential equations

system=get_basis(ddsol);
params=get_parameter_map(system)

### plot to see the accuracy
plot(plot(sir_problem, title="data"),plot(ddsol, title="model"), layout=(1,2))

####################### Simulation #######################

dudt = let b = get_basis(ddsol)
    (u,p,t) -> b(u,p,t)
end

estimation_prob = ODEProblem(dudt, u0, tspan, get_parameter_values(system))
estimate = solve(estimation_prob, Tsit5(), saveat = 1)
plot(estimate, label=["Suseptible" "Infected" "Removed"], color=[:blue :red :green])
plot!(data_SIR, ls=:dash, lw=2,  label=["Suseptible" "Infected" "Removed"], color=[:blue :red :green])

new_steps=steps+100
tspan=(0,new_steps)
estimation_prob = ODEProblem(dudt, u0, tspan, get_parameter_values(system))
estimate = solve(estimation_prob, Tsit5(), saveat = 1)
problem=ODEProblem(SIR, u0, tspan)
data_new=solve(problem, Tsit5(), saveat=dt)#, atol=1e-7, rtol=1e-8)
plot(estimate, label=["Suseptible" "Infected" "Removed"], color=[:blue :red :green])
plot!(data_new, ls=:dash, lw=2,  label=["Suseptible" "Infected" "Removed"], color=[:blue :red :green])

#########################################################################
############################### Discrete Form ###########################
#########################################################################

u0=[0.9999; .0001; 0.0]
steps=1_000
tspan=(0, steps)
dt=1
# I think this is inherently wrong, discrete should differ from continuous form

problem=DiscreteProblem(SIR, u0, tspan)

# Use functionmap here instead of tsit.
data_SIR=solve(problem, FunctionMap(), saveat=dt)#, atol=1e-7, rtol=1e-8)

First, I used the right problem constructor ( directly from the DESol ). Secondly, I normalised the data, given that the value range was quite low. Third, I think the Discrete Solution was never discrete, given that you use the same set of DE here and - I might be wrong here - this is not the right one for a discrete SIR model.

Cheers!

MortezaBabazadehShareh commented 1 year ago

Thanks for your help

SciML / DataDrivenDiffEq.jl

Using DataDrivenSolution for forward modelling #448