SciML / DataDrivenDiffEq.jl

Data driven modeling and automated discovery of dynamical systems for the SciML Scientific Machine Learning organization
https://docs.sciml.ai/DataDrivenDiffEq/stable/
MIT License
404 stars 57 forks source link

Problem rework and subsampling #330

Closed AlCap23 closed 2 years ago

AlCap23 commented 2 years ago

Adds some more insights into the problem via show and print. Allows for naming a problem. Assert if a problem is applicable to a basis.

TODO: Subsampling of the problem

AlCap23 commented 2 years ago

Current example

using DataDrivenDiffEq
using LinearAlgebra
using ModelingToolkit
using OrdinaryDiffEq
using Plots 
using Statistics
using StatsBase
using Random

@variables u[1:2]
basis = Basis([polynomial_basis(u, 5); sin.(u); cos.(u)], u)

function pendulum(u, p, t)
    x = u[2]
    y = -9.81sin(u[1]) - 0.1u[2]^3 - 0.2 * cos(u[1])
    return [x;y]
end

u0 = [0.99π; -1.0]
tspan = (0.0, 20.0)
dt = 0.1
prob = ODEProblem(pendulum, u0, tspan)
sol = solve(prob, Tsit5(), saveat=dt)

X = Array(sol)
t = sol.t
Random.seed!(1234)
X = X .+ 1e-1*randn(size(X))

dd_prob_noisy = ContinuousDataDrivenProblem(
    X, t, GaussianKernel()
    )

sampler = DataSampler(Split(ratio = 0.8), Batcher(n = 4, shuffle = true, repeated = true, batchsize_min = 40))
opt = ADMM(1e-3:1e-3:5e-1)
sol = solve(dd_prob_noisy, basis, opt, maxiter = 1000, by = :stat, progress = true, denoise = true, normalize = true, sampler = sampler)
sol

Output of the result and parameters

julia> println(result(sol))
Model ##Basis#413 with 2 equations
States : u[1] u[2]
Parameters : 7
Independent variable: t
Equations
Differential(t)(u[1]) = p₁*u[2]
Differential(t)(u[2]) = p₂*u[2] + p₃*(u[2]^3) + p₄*(u[2]^5) + p₅*sin(u[1]) + p₆*cos(u[1]) + p₇*cos(u[2])

julia> parameters(sol)
7-element Vector{Measurements.Measurement{Float64}}:
   0.958 ± 0.051
  -0.075 ± 0.096
  -0.072 ± 0.049
 -0.0025 ± 0.005
   -8.76 ± 0.31
    -0.1 ± 0.2
    -0.2 ± 0.14

The DataSampler consists of two major components which can be used individually. Split defines a train / test split. Batcher creates data batches of the training data. Both take in random seeds ( need to expose this in the API ) to ensure reproducibility.

The keyword by can be set to use either

or anything which defaults to :min.

Additionally, the original output of SINDy now contains all the best sparsity knobs, which can be retrieved via output(sol).\lambda

AlCap23 commented 2 years ago

I still need to adapt the docs, but tests are green locally.

For this I'll open a separate PR, since I want to switch to Literate.jl for the examples. This would allow us to also generate notebooks for easy access and starting points.

AlCap23 commented 2 years ago

This is the same script.

Screenshot 2022-02-04 at 17 34 31

AlCap23 commented 2 years ago

I've flagged the test broken for now. There is no eminent reason for me why the result is good on v.1.7 and off on v.1.6. But I'll have a look into this when this is partially done. But maybe we should not release for now.

Edit I've checked some of the possible sources for this, but found no clear source for the problem right now. Since using more samples seems to work out on both versions, I've adapted the tests slightly. I am confused however why this is an issue. There could have been scaling issues and minor bugs related to implicit identification which played in our favour here.