Closed yewalenikhil65 closed 3 years ago
Requesting help to understand the error in more detail.
The UserWarning
is telling you that some of the learned equations are all 0's. With the STLSQ
optimizer, the threshold you set will determine the smallest coefficient that is retained. You probably shouldn't need to set it as low as 0.00005. See the suggestions below for ways of dealing with this issue.
The LinAlgWarning
is saying that the matrix of library functions, Theta(X)
probably has columns that are very close to linearly dependent. This can occur if your state variables are highly correlated or can be expressed as polynomial functions of other state variables (since you're using a polynomial library). I've seen the latter come up when the exact solution of the underlying differential equations involved sines and cosines due to trig identities (e.g. if x1(t) = sin(t) and x2(t) = cos(t) are solutions, then x1^2 + x2^2 = 1, so the columns of Theta(X)
corresponding to 1, x1, and x2, will be linearly dependent).
A few things worth trying:
ps.STLSQ(normalize=True)
. Different scales in input features can affect the relative sizes of coefficients.alpha
, with ps.STLSQ(alpha=0.1)
. You can try different versions of alpha
and see what works best.Dx_train
. If the data are noisy, be sure to use a differentiation method that is robust to noise.Dx_train
derivatives of state-variables are obtained from the ode system-model(which is stiff) itself by plugging in the solution and evaluating the RHS. I am trying to derive the ode-system by SINDy again, however am unable to do so.
I notice that indeed there are some highly co-related state variables in my x_train
(as they have nearly same qualitative variation with time).
I have 7 states in x_train
= [x1(t),x2(t),x3(t),x4(t),x5(t),x6(t),x7(t)]
Following code, I guess includes all combinations of xy
from 7 state variables ( some of which are again co-related).
library_functions = [
lambda x, : x,
lambda x,y : x*y,
]
library_function_names = [
lambda x, : x,
lambda x,y : '(' + x + '*' + y + ')',
]
But suppose of all combinations I wish to specifically select only 2 or 3 combinations such as x2*x3
or x3*x4
which are not much co-related. How I do proceed further to avoid including all the combinations ?
But suppose of all combinations I wish to specifically select only 2 or 3 combinations such as
x2*x3
orx3*x4
which are not much co-related. How I do proceed further to avoid including all the combinations ?
Did anything I suggested help at all? Increasing the regularization parameter, alpha
, should help. If the system is stiff, it is likely that there are at least two different timescales coming into play, so you may be able to improve your results by using a non-uniform time sampling scheme. See this paper for further information. You may also be able to get away with simply increasing the sampling rate.
Hello sir, Thanks for pointing me to the paper, I will go through it asap. Also will check the effect of increasing sampling rate.
Based on the earlier suggestions, here is what I found.
alpha
and normalization=True
helped in spitting out some non-zero terms inaccurately for couple of state-variables, but the other state-variables still remain zero.SR3
(and also STLSQ
with large alpha
)does not give any error or warning, but gives results that are not explainable physically(terms printed by the model are not possible).Lasso
also helped in spitting out some non-zero terms inaccurately for couple of state-variables, but it also gives warning of increasing max iterations
even after setting it to 2000Here is the explanation of the model, [x1(t), x2(t),...,x7(t)]
are the state-variables obtained from a high-dimensional(many state-variables) chemical-reactions full model which has even more number of state-variables. It has been proved in literature that a reduced-order model can be constructed from using only these 7 state variables, and the equivalent reduced order model of only 4 parameters is given below (which only approximately re-presents actual full model)
[k1, k2, k3, k4] = [6.93938e+20 , 3.49444e+03 ,1.48266e+08, 2.82183e-02]
# inaccurate parameters for reduced order model, obtained from optimization methods like curve fitting
[x1 ,x2,x3,x4,x5,x6,x7] @(t =0) = [100e-12, 0.0, 1.6e-07 , 2.0e-08 , 0.0 , 1.4e-06, 0.0 ]
# units (M)
t_span = 0:800; # time span for ODE system ,. 800s
x1' = −k1 (x1 x2 x3 x4)
x2' = −k1 (x1 x2 x3 x4) + k2 (x1 x3 x6) + k3 (x5 x6) − k4 x2
x3' = −k1 (x1 x2 x3 x4)
x4' = −k1 (x1 x2 x3 x4)
x5' = k1 (x1 x2 x3 x4)
x6' = −k2(x1 x3 x6) − k3 (x5 x6)
x7' = k4 x2
I was trying to prove these above equations hold using SINDy, which might spit out more accurate values of parameter [k1,k2,k3,k4]
for which the reduced-order model accurately represents the full model.
Thanks for the detailed explanation. A couple of things stand out to me here:
x2
term, which you expect to have a smaller coefficient value and a larger threshold for, say, (x1 x2 x3 x4)
.Hi,
Polynomial library to degree four was used, But I was unable to get the correct result.
It really was because of parameter k
spanning many orders of magnitude. Increasing sampling time did no help. I also tried to tune the constrained SR3 optimizing for different library terms, but was unable to recover the equations.
Recently, I have come across a nice paper that does tackle this issue of parameters spanning orders of magnitude(as in chemical reactions networks). The key is to scale the linear system before minimizing it. Attaching the link here for the reference. Will check how this fares with SINDy Rapid data-driven model reduction of nonlinear dynamical systems including chemical reaction networks using ℓ1-regularization https://aip.scitation.org/doi/10.1063/1.5139463
Thanks for the reference. Setting normalization=True
should rescale each of the columns, but you might have better luck doing the rescaling by hand via the approach in the paper you linked. I'm curious to hear whether it works!
x_train
array of 801 x 7 (7 state variables, 801 time instances) , and same is shape of DerivateDx_train
array. I am facing following error of big-sparsity parameter error even with so small value of threshold, and also related to ill-conditioned matrix. I am new to pySINDy. Requesting help to understand the error in more detail.