dynamicslab / pysindy

A package for the sparse identification of nonlinear dynamical systems from data
https://pysindy.readthedocs.io/en/latest/
Other
1.36k stars 304 forks source link

Passing custom derivatives instead of using predefined differentiation classes #464

Closed b-fg closed 5 months ago

b-fg commented 5 months ago

I have a function u(x,t) for which I can compute spatial (u_x, u_xx, ...) and temporal derivatives (u_t, u_tt, ...) using my own numerical method. Then I have all the training data for u(x,t) and the temporal and spatial derivatives associated to those samples. How can I use this data to try to find an expression for u_t = ? without using the built-in derivatives methods in PySINDy?

Do I need to write a BaseDifferentiation-inherited class implementing _differentiate or can I just pass the temporal derivatives using the x_dot argument in pysindy.SINDy? How about the spatial derivatives, should I just construct a CustomLibrary with the spatial derivatives or is there another way to pass them to function library? Or maybe I could just construct the function library (matrix) myself and pass it directly to pysindy.SINDy?

In summary, I have a function u(x,t) that I can use to generate u given x,t and the associated partial derivatives: u_t, u_tt, ..., u_x, u_xx, ..., u_xt which I want to use to find and expression for u_t. What's the best way of doing so?

Thanks in advance!

Jacob-Stevens-Haas commented 5 months ago

Ideally, if your method of generating derivatives would be useful to other people, you could write a BaseDifferentiation subclass (or fork derivative package and add a method to their API, which SINDyDerivative uses under the hood) and PR it.

A bit of clarification: lets say you have calculated $N$ time derivatives. I'm assuming that means you want to find a PDE with N time derivatives. Obviously if you try to find the coefficients for $\partial$ u_t $/\partial t$ and you let u_tt in as a candidate, the system ought to find it's coefficient is 1.

IIUC and you are looking for an Nth time-order PDE with a linear term on that Nth time derivative, here's how:

If instead you're looking for something nonlinear in the Nth order like $uu_{tt} = f(u, u_x, u_t, u_xx, u_xt)$, you would need SINDyPI. The x would be all terms with less than $N$ time derivatives and the x_dot would be all terms with at least one time derivative. The caveat is that it will fit M models, where M is the number of features expressed by applying the feature library to x, and you need to visually scan the results to find one that looks reasonable.

b-fg commented 5 months ago

Thank you for the detailed reply. The derivatives I am taking currently are WIP, but I will definitely try to contribute to this repo once I have a more validated approach.

In general, I have a dynamical system such that $u_t=f(x,t)=f(u,ux,u{xx},...)$, with non-linear combinations such as $(uu,uux,uu{xx},uxu{xx},...)$. For the moment, I am fine by not considering high-order temporal (or mixed) derivatives in $f$. For the moment I have been able to achieve this with the following pseudo-code, where XT is the space-time grid containing N samples :

  1. Compute feature arrays and target data $u_t$ (dudt)
    u = f(XT)
    dudx = ddx(u, XT)
    d2udx2 = d2dx2(u, XT)
    ...
    features = np.stack([u, dudx, d2udx2,...], axis=-1)
    dudt = ddt(u,XT)
  2. Fit polynomial library of p order
    model_ps = ps.SINDy(feature_library=ps.PolynomialLibrary(degree=p, include_bias=False), optimizer=ps.STLSQ())
    model_ps.fit(features, x_dot=dudt)

    Even though this usually spits out the correct PDE, I am not sure is the best it approach. For example, print(model_ps) shows differentiation_method=FiniteDifference(axis=-2), and I am unsure if this is doing something under the hood.

Also, I would like to ask if there is any specific way to cap the maximum number of terms that the optimizer can include in the expression. And, in this sense, I have observed that the only optimizer that could find the correct expression was STLSQ, the others I tried did not manage to do so. What optimizer should be best suited for this problem?

Thanks!

Jacob-Stevens-Haas commented 5 months ago

Ok, that makes sense. SINDy models' default differentiation method is FiniteDifference(axis=-2), but this is ignored if you pass x_dot. (in the current branch we label the array axes when the problem is created so we can be more explicit with FiniteDifference(axis=x.ax_time))

MIOSR optimizer, or Mixed-Integer Optimization, Sparse Regression, allows you to specify exactly how many terms should be included. SR3 (and her subclasses) allow a direct penalty on the number of terms with an L-0 penalty. SBR coming soon to a theater near you (#440) will give you a bit more control over the penalty for how many terms are nonzero and how far they're allowed to wander from the origin.

That said, I don't think any optimizer allows you to select a maximum term.

b-fg commented 5 months ago

Noted. So I understand that the approach I took is "SINDy-friendly" and appropriate to solve the problem I indented. Will also play a bit more with the optimizers. Thanks again for all the info!