SciML / DataDrivenDiffEq.jl

Data driven modeling and automated discovery of dynamical systems for the SciML Scientific Machine Learning organization
https://docs.sciml.ai/DataDrivenDiffEq/stable/
MIT License
407 stars 57 forks source link

ContinuousDataDrivenProblem: Difficulty in forming DX with ODEProblem and DAEProblem #323

Open finmod opened 2 years ago

finmod commented 2 years ago

To run ContinuousDataDrivenProblem on the model below that I want to run as an Implicit system discovery, I get : MethodError: objects of type ODESystem are not callable in forming the DX matrix. This happens with both ODEProblem and DAEProblem.

using DataDrivenDiffEq using LinearAlgebra using ModelingToolkit using DifferentialEquations: solve using NonlinearSolve using OrdinaryDiffEq, Sundials using Plots; gr() using Plots.PlotMeasures using Random using Symbolics: scalarize using Latexify import ModelingToolkit: Interval, infimum, supremum

ModelingToolkit.@parameters begin

  η=500                          # η -> Inf Leontief production function; η -> 0 Codd-Douglas; η -> -1 linear
  C=0.33                         # factor productivity
  b=0.135                        # capital intensity of production
  α=0.02                         # productivity growth rate
  β=0.01                         # population growth rate
  δ=0.05                         # depreciation rate; other values δ=0.10
  r=0.03                         # real interest rate
  Φ₁=0.04/(1. - 0.04^2)          # Phillips'curve parameter intercept 0.04/(1. - 0.04^2)
  Φ₂=0.04^3/(1. - 0.04^2)        # Phillips'curve parameter slope 0.04^3/(1. - 0.04^2)
  Φ₃=50                          # Phillips'curve parameter
  γ=0.8                          # (1 - γ) degree of money illusion: ther values γ=0.96
  η₁=0.35                        # adjustment speed for prices; other values η₁=0.45
  m=1.6                          # markup value
  η₂=0.2                         # substitution parameter of consumption share
  ξ₁=0.9                         # Upper right asymptote of Generalized logistic function https://en.wikipedia.org/wiki/Generalised_logistic_function 
  ξ₂=0.8                         # range from ξ₂= 1^(-13) Gomperts to ξ₂=1 logistic 
  ξ₃=-5.0                        # 
  e₁=0.0                         # share of equity held by banks 

end

@variables begin t ω(t) = 0.75 λ(t) = 0.90 d₁(t) = 0.5 ι(t) y₁(t) = 0.735 Y(t) = 100.0 Φ(t) ξ(t) g(t)
end

D = Differential(t) tmax = 100.0

domains = [t ∈ Interval(0.0, tmax), ω ∈ Interval(0.0, 1.0), λ ∈ Interval(0.0, 1.0), d₁ ∈ Interval(0.0, 1.8), y₁ ∈ Interval(0.0, 1.0), Y ∈ Interval(0.0,10.0^5)]

Φ = -Φ₁ + Φ₂/((1. - λ)^2) ι = η₁(mω − 1.) ξ = ξ₁/((1. + ξ₂exp(ξ₃y₁))^(1/η₂)) g = (1. - ξ)C((1 - ω)/b)^(1/η) - δ - ω/((1 - ω)(1 + η))(Φ - α - (1 - γ)*ι)

sfc = [ D(ω) ~ ω((η/(1 + η))(Φ - α - (1 - γ)ι)) D(λ) ~ λ(((1. - ξ)C((1 - ω)/b)^(1/η) - (δ + β + α) -(1/((1 + η)(1 - ω)))(Φ - α - (1 - γ)ι))) D(d₁) ~ d₁(r - g - ι) + ξ - ω - re₁ D(Y) ~ Yg

  1. ~ y₁ - ω + r*d₁ ] ModelingToolkit.@named keen = ModelingToolkit.ODESystem(sfc);

keen_simplified = structural_simplify(keen)

tspan = (0.0, tmax) dt=0.25 mmprob = ODEProblem(keen_simplified, [], tspan) sol = solve(mmprob, Rodas4(),abstol=1/10^14,reltol=1/10^14,saveat=dt);

du = mmprob.f(mmprob.u0,mmprob.p,0.0) du0 = D.(states(keen_simplified)) .=> du daeprob = DAEProblem(keen_simplified,du0,[],tspan) ref_sol = solve(daeprob,IDA(),abstol=1/10^14,reltol=1/10^14,saveat=dt);

X = ref_sol[:,:] DX = similar(X) for (i, xi) in enumerate(eachcol(X)) DX[:, i] = keen_simplified(xi, [], ref_sol.t[i]) end ts = ref_sol.t

ddprob = ContinuousDataDrivenProblem(ref_sol) ddprob = ContinuousDataDrivenProblem( X , ts, DX = DX[1:3, :] )

gives the error:

MethodError: objects of type ODESystem are not callable

Stacktrace: [1] top-level scope @ .\In[10]:4 [2] eval @ .\boot.jl:373 [inlined] [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base .\loading.jl:1196

AlCap23 commented 2 years ago

I will check it out later on, but I think the line

for (i, xi) in enumerate(eachcol(X))
DX[:, i] = keen_simplified(xi, [], ref_sol.t[i])
end

spits out the error, because keen_simplified is a ODESystem and not a function.

So you should just use the ref_solution or the interpolated states for creating DX.

finmod commented 2 years ago

There are quite a few problems here.

First, there is an issue with keen_simplified = structural_simplify(keen) that fails. I will raise it in MTK and bypass it using keen instead.

Second, the lack of function is solved by using InterpolationMethod()

X = ref_sol[:,:] t = ref_sol.t

ddprob = ContinuousDataDrivenProblem(X, t, InterpolationMethod())

Continuous DataDrivenProblem{Float64}

Last, the model discovery can start as an implicit system but I am lacking sufficient docs on setting up the discovery process.

From the model described above, the basis should include a polynomial (degree 2 or 3) of @variables t ω(t) λ(t) d₁(t) y₁(t) , the variables in X and an implicit basis formed with D(ω), D(λ), D(d₁) and D(ω)/ω, D(λ)/λ, D(d₁)/d₁. How to do this?

Next, the use of ImplicitOptimizer(STLSQ()) is a two-step solution where the explicit problem is solved first with STLSQ. Will the basis defined above work?

AlCap23 commented 2 years ago

Sorry I have a lot of deadlines this week plus some additional project work on top.

Just a quick question w.r.t. the implicit identification: is the process description in the docs ( nonlinear / implicit : cartpole ) not sufficient? If not, what exactly is your problem with setting up the basis? I am asking to get an understanding what I can improve upon :).

Second: you need to define the partial derivatives as additional signals and derive before the datadriven problem. A simple way to include this would be to just use vcat on those derived properties ( either use Interpolation for u(x) and take the derivative or a DiffEqOperator) or add them as inputs of the basis. Then tag them via the implicit kwarg in the function argument.

Third the ordering of the data can be done in one or two ways: either include just a single trajectory of a discretized state of the PDE and assume the overall structure or vcat all of the signals as a single vector ( u[1], u[2] etc, where 1,2 are the discretized states ). Be aware that this might mess up the conditioning of the problem ( I expirenced this whiile working on a MWE to automate this ).

Note that a fully automatic way of doing thing is not possible right now, because even though we can write down partial derivatives they are just symbols in terms of data and not further processed.

finmod commented 2 years ago

"Just a quick question w.r.t. the implicit identification: is the process description in the docs ( nonlinear / implicit : cartpole ) not sufficient? If not, what exactly is your problem with setting up the basis? I am asking to get an understanding what I can improve upon :)."

The weakness in the docs is about forming and using X and DX throughout DataDrivenDiffEq. There should be a full page on X and DX from different angles: synthetic data source, external data source along the line of the Hudson Bay example of the UDE repository. The diagrams from https://royalsocietypublishing.org/doi/10.1098/rspa.2020.0279 may be a better explanation. This rejoins https://github.com/SciML/DataDrivenDiffEq.jl/issues/317 . The signature of ContinuousDataDrivenProblem could be improved to give name and dimensions of problem. In short, a DAE example in the UDE paper would solve all of these issues.

Concerning the definition of basis, the model above contains polynomial of low order of the states with some of the states of negative order (-1 for 1/x or -2 for 1/x^2). My question is: how do you handle this type of basis? as ratio of polynomials, as differential of polynomials or as polynomials of changed variables.

In short, replacing the cartpole tutorial by the model above or the AkzoNobel benchmark https://benchmarks.sciml.ai/html/DAE/ChemicalAkzoNobel.html suggested to me by @ChrisRackauckas would be a better tutorial of the MTK , DAEProblem and ContinuousDataDrivenProblem sequence.

AlCap23 commented 2 years ago

Thanks for your reply and the detailed explanation!

The weakness in the docs is about forming and using X and DX throughout DataDrivenDiffEq. There should be a full page on X and DX from different angles: synthetic data source, external data source along the line of the Hudson Bay example of the UDE repository. The diagrams from https://royalsocietypublishing.org/doi/10.1098/rspa.2020.0279 may be a better explanation. This rejoins #317 .

I think I'll implement the UDE examples as well in the docs, to make this more clear and provide a better workflow. Right now, the docs in general is a clear weakness of the overall package, since I am in too deep to provide an unbiased view and have a lot of day to day work, which is more on the dev side than the exposed usage and API. But adding the Lotka Volterra Examples form the UDE paper is an easy goal.

The signature of ContinuousDataDrivenProblem could be improved to give name and dimensions of problem.

There is definitely no harm in naming a problem, but its not clear to me what is the clear benefit in terms of usability of the package. I ( again, limited perspective ) assume that a user would either use a single problem per dataset in the recovery of the unknown equations or track them individually ( if multiple problems are created. )

In short, a DAE example in the UDE paper would solve all of these issues.

Along the lines of visibility it would have helped. At the time the UDE paper took shape, the implicit identification did not work reliably or was pretty new. I agree that more examples - at least in the docs - might be useful, but I think right now the UDE paper is dormant.

Concerning the definition of basis, the model above contains polynomial of low order of the states with some of the states of negative order (-1 for 1/x or -2 for 1/x^2). My question is: how do you handle this type of basis? as ratio of polynomials, as differential of polynomials or as polynomials of changed variables.

Within the scope of DataDrivenDiffEq, everything is handled as it would be inside ModelingToolkit, SymbolicUtils or Symbolics with the exception of detecting linear dependent terms inside the basis ( on demand ). I think its just transformed as is, so no further transforms are made and the resulting code might be as (in)efficient as one formulates the basis.

A simple example:

# Let X, DX, t be defined by some unknown process
prob = ContinuousDataDrivenProblem(X, t, DX =DX)

@variables x[1:size(x, 1)] t

basis = Basis([x; x.^2; inv(x); inv(x.^2)], x, iv = t)

Would define a basis in terms of X. If I now want to include a transformed variables, I can either do this as a new state Z = [x[1], x[2], x[1]^(-1)]

Z = hcat(X, inv.(X[1:1,:]))

prob = ContinuousDataDrivenProblem(Z, t, DX =DX)

@variables z[1:size(z, 1)] t

basis = Basis([z; z.^2], z, iv = t)

Or as an external signal via the inputs u


u(x, p, t) = inv(x[1])

prob = ContinuousDataDrivenProblem(X, t, DX =DX, U = u)

@variables x[1:size(x, 1)] t u

basis = Basis([x; x.^2; u; u^2], x, controls = [u] iv = t)

I might be missing the point ( except of course for numerical efficiency of the examples ), but basically it is a WYSIWYG approach.

In short, replacing the cartpole tutorial by the model above or the AkzoNobel benchmark https://benchmarks.sciml.ai/html/DAE/ChemicalAkzoNobel.html suggested to me by @ChrisRackauckas would be a better tutorial of the MTK , DAEProblem and ContinuousDataDrivenProblem sequence.

I think it is a valid addition to the example section, but it might be even better to stick to the publication examples from the original paper ( either the SINDy-PI or Implicit SINDy ) where good trajectories are available. Also, this examples seems quite intense for the recovery and might not be easy to understand for all possible users ( I am a mechanical engineer, so the cart pole is by far the most accessible example for me in terms of DAE ). I think adding a third and forth example is definitely worth it. I would however start with the regulatory network.

AlCap23 commented 2 years ago

I've added some more examples with #335 ( both for implicit systems and one prototypical PDE discovery for the heat equation ).

Additionally, I've found an error in the preprocessing of the implicit optimiser, which should also be fixed.

Naming of the problem is now possible ( but without further effect at the moment).

finmod commented 2 years ago

@AlCap23 Nice improvements in the WIP. Let me restate my MWE for a DAE system discovered from data. the Continous DataDriven(odae_sol) is now working. The next issue arises with the syntax for defining the basis and whether it takes it directly from odae_sol or if I have define the basis ex novo as a polynomial in u.

using ModelingToolkit
using LinearAlgebra
using DataDrivenDiffEq
using OrdinaryDiffEq, Sundials
using Plots; gr()

ModelingToolkit.@parameters begin
    ϕ₀ =0.0401
    ϕ₁ =6.41e-05
    η  =0
    k₀ =-0.0065
    k₁ =-5.0
    k₂ =20
    α  =0.025
    δ  =0.05
    β  =0.01
    r  =0.03
    cor=3.0
    fp =0.333
    b  =0.135
end

@variables begin
    t
    ω(t)  = 0.75
    λ(t)  = 0.90
    d₁(t) = 0.5
    π(t)
    Y(t)  = 100.0
    𝛎(t)
    𝛟(t) 
    κ(t) 
    g(t)    
end

Dₜ = Differential(t)

# Variable functions
𝛎 = (t, ω)    ->    (1/fp)*((1 -  ω)/b)^η 
𝛟 = (t, λ)    ->    -ϕ₀ + ϕ₁/((1 - λ)^2)
κ = (t, π)    ->    k₀ + exp(k₁ + k₂*π)
g = (t, ω, π) ->    κ(t, π)/𝛎(t, ω) - δ

sfcsys = [
    Dₜ(ω)  ~ ω*(𝛟(t, λ) - α)
    Dₜ(λ)  ~ λ*(g(t, ω, π) - α - β)
    Dₜ(d₁) ~ d₁*(r - κ(t, π)/𝛎(t, ω)) + ω - 1 + κ(t, π)
    Dₜ(Y)  ~ Y*g(t, ω, π)
    0.    ~ π - 1 + ω + r*d₁
]

ModelingToolkit.@named primalKeen = ModelingToolkit.ODESystem(sfcsys)
tspan = (0.0, 80.0)

rdKeen = structural_simplify(primalKeen)
full_equations(rdKeen)

dt=0.25
odaeprob = ODAEProblem(rdKeen,[],tspan)
odae_sol = solve(odaeprob,Tsit5(),abstol=1/10^14,reltol=1/10^14, saveat=dt, maxiters=1e7);    # 11 ms
prob = ContinuousDataDrivenProblem(odae_sol)

@parameters t
# @variables ω(t) λ(t) d₁(t) π(t) Y(t)
@variables u[1:4](t)
Ψ = Basis([u; u[1]^2], [u; u[2]^2], [u; u[3]^2], u, iv = t)
#Ψ = Basis(polynomial_basis(u, 2), iv = t)     #[u; u[1]^2], u, independent_variable = t)
res = solve(prob, Ψ, DMDPINV(), digits = 1)
system = result(res)
#md println(res) # hide
#md println(system) # hide
#md println(parameters(res)) # hide
# The underlying dynamics have been recovered correctly by the algorithm!

gives the error:

MethodError: no method matching Basis(::Vector{Num}, ::Vector{Num}, ::Vector{Num}, ::Symbolics.Arr{Num, 1}; iv=t) Closest candidates are: Basis(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) at C:\Users\Denis.julia\packages\DataDrivenDiffEq\hVSnE\src\basis\type.jl:50 got unsupported keyword argument "iv" Basis(::Vector{Equation}, ::Vector, ::Vector, ::Vector, ::Vector, ::Num, ::Function, ::Symbol, ::Vector{Basis}) at C:\Users\Denis.julia\packages\DataDrivenDiffEq\hVSnE\src\basis\type.jl:50 got unsupported keyword argument "iv" Basis(::AbstractVector, ::AbstractVector; parameters, iv, controls, observed, name, simplify, linear_independent, eval_expression, kwargs...) at C:\Users\Denis.julia\packages\DataDrivenDiffEq\hVSnE\src\basis\type.jl:72 ...

Stacktrace: [1] top-level scope @ In[3]:4 [2] eval @ .\boot.jl:373 [inlined] [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base .\loading.jl:1196