SciML / DataDrivenDiffEq.jl

Data driven modeling and automated discovery of dynamical systems for the SciML Scientific Machine Learning organization
https://docs.sciml.ai/DataDrivenDiffEq/stable/
MIT License
402 stars 56 forks source link

DMD-like solves fail for systems with at least 32 variables #379

Closed bgroenks96 closed 1 year ago

bgroenks96 commented 1 year ago

I am running into a weird error when trying to run DMD (both linear and nonlinear) on some data generated by a PDE model and managed to isolate it in an MWE:

using ComponentArrays
using DataDrivenDiffEq
using ModelingToolkit

# works
X = randn(31,366)
t = collect(1.0:366.0)
data_prob = ContinuousDataDrivenProblem(X, t)
sol = solve(data_prob, DMDSVD())
# does not work
X = randn(32,366)
t = collect(1.0:366.0)
data_prob = ContinuousDataDrivenProblem(X, t)
sol = solve(data_prob, DMDSVD())
# ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Float64})

See full stack trace below.

Basically, this method error occurs only when the number of variables >= 32 which seems to imply there is some kind of threshold somewhere that is screwing up the element types of the Jacboian vector. Digging into the code a bit, it looks like it occurs when building the jacobian function. It's possible that this is actually a bug with Symbolics.jl, but I'll leave it here for now.

ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Float64})
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  *(::StridedVecOrMat, ::Adjoint{<:Any, <:LinearAlgebra.LQPackedQ}) at ~/julia-1.8.2/share/julia/stdlib/v1.8/LinearAlgebra/src/lq.jl:269
  *(::StridedVecOrMat, ::LinearAlgebra.LQPackedQ) at ~/julia-1.8.2/share/julia/stdlib/v1.8/LinearAlgebra/src/lq.jl:293
  ...
Stacktrace:
 [1] init(::DataDrivenProblem{Float64, true, DataDrivenDiffEq.Continuous}, ::Basis, ::DMDSVD{Float64}; B::Vector{Any}, eval_expression::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ DataDrivenDiffEq ~/.julia/packages/DataDrivenDiffEq/s9jl3/src/solve/koopman.jl:102
 [2] init
   @ ~/.julia/packages/DataDrivenDiffEq/s9jl3/src/solve/koopman.jl:82 [inlined]
 [3] init(::DataDrivenProblem{Float64, true, DataDrivenDiffEq.Continuous}, ::DMDSVD{Float64}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ DataDrivenDiffEq ~/.julia/packages/DataDrivenDiffEq/s9jl3/src/solve/koopman.jl:53
 [4] init(::DataDrivenProblem{Float64, true, DataDrivenDiffEq.Continuous}, ::DMDSVD{Float64})
   @ DataDrivenDiffEq ~/.julia/packages/DataDrivenDiffEq/s9jl3/src/solve/koopman.jl:42
 [5] solve(::DataDrivenProblem{Float64, true, DataDrivenDiffEq.Continuous}, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ CommonSolve ~/.julia/packages/CommonSolve/TGRvG/src/CommonSolve.jl:23
 [6] solve(::DataDrivenProblem{Float64, true, DataDrivenDiffEq.Continuous}, ::DMDSVD{Float64})
   @ CommonSolve ~/.julia/packages/CommonSolve/TGRvG/src/CommonSolve.jl:23
 [7] top-level scope
   @ ~/workspace/repos/cryogridml/experiments/koopman/data_driven_cryogrid.jl:60
julia> versioninfo()
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 
(koopman) pkg> st DataDrivenDiffEq
Status `~/workspace/repos/cryogridml/experiments/koopman/Project.toml`
  [2445eb08] DataDrivenDiffEq v0.8.5
AlCap23 commented 1 year ago

I'll try to include this as a test within #371

AlCap23 commented 1 year ago

Have you looked into the derivatives and the conditioning of the matrices?

I've experimented a little and it takes a long time, but the overall solution seems to work ( at least at the current main branch ).

Basically, if the system states are random, it might be hard to find a good approximation of the operator.

A toy example

A = -I + 0.001*randn(50,50)
x0 = [10 * randn(50)]
dx0 = []
t = [0.0]
for i in 1:200
    push!(dx0, A*x0[end])
    push!(x0, exp(A * t[end]) * x0[1])
    push!(t, t[end] + 0.01)
end
push!(dx0, A*x0[end])
X = hcat(x0...)
DX = hcat(dx0...)
t
data_prob = ContinuousDataDrivenProblem(X, t, DX)
using Plots
plot(data_prob, legend = false)

jsol = solve(data_prob, DMDSVD(), options = DataDrivenCommonOptions(digits = 2))
# Adapted API
plot(jsol)
print(get_basis(jsol))
get_parameter_values(get_basis(jsol))

Results in

Model ##Basis#431 with 50 equations
States : 50
Parameters : 2328
Independent variable: t
Equations
...

Maybe even better:

A = randn(50,50)
A = (A .- A')/2 
x0 = [randn(50)]
dx0 = []
t = [0.0]
for i in 1:200
    push!(dx0, A*x0[end])
    push!(x0, exp(A * t[end]) * x0[1])
    push!(t, t[end] + 0.01)
end
push!(dx0, A*x0[end])
X = hcat(x0...)
DX = hcat(dx0...)
data_prob = ContinuousDataDrivenProblem(X, t)
using Plots
plot(data_prob, legend = false)

jsol = solve(data_prob, DMDPINV(), options = DataDrivenCommonOptions(digits = 10))
plot(jsol)
print(get_basis(jsol))
print(jsol)

norm(reshape(get_parameter_values(get_basis(jsol)), 50, 50) .- A) # This is high even though we matched quite good
bgroenks96 commented 1 year ago

I used the randn data just for the purposes of the MWE. I originally discovered the issues with an actual use case with data generated by solving a 1D PDE which seemed to magically stop working when the discretized system had > 32 variables. I will take another look, maybe it is actually just a coincidence.

AlCap23 commented 1 year ago

I feared this might be the case 😅 . Could you have a look at the derivatives or maybe plot the problem?

Also, I might add a Linearsolve interface for DMDPINV, or even all of the Koopman stuff, which would allow for preconditioners and a smoother experience.

AlCap23 commented 1 year ago

In #371 I tested the 32 use case, and it works on my machine.

The problem is more on the side of the random data, but feel free to reopen.