dynamicslab / pysindy

A package for the sparse identification of nonlinear dynamical systems from data
https://pysindy.readthedocs.io/en/latest/
Other
1.46k stars 324 forks source link

[BUG] Iteration problem when using ensemble SINDy and GenerlizedLibrary #158

Closed BMP-TUD closed 2 years ago

BMP-TUD commented 2 years ago

Dear all,

sorry for reaching out to you again, after you already cleared up a few things with the WeakPDELibrary, but I encountered a problem with the Generlized library. Here, I am creating a large library only containing specific terms from a set of 6 different smaller libraries. I have a set of 5 variables, but not all variables are implemented in each Library. This works perfectly fine, with the use of the GeneralizedLibrary and I can fit the model normally. Here is the example on how I create the library (I suppose there is an easier way):

    A_T=1
    P_T=1
    G_T=1
    E_T=3

    lf0 = [
    lambda x : x,
        ]
    lf0 = [
        lambda x : x,
        ]
    l0=ps.CustomLibrary(library_functions=lf0,function_names=lf0,include_bias=False)

    lf1 = [
    lambda x,y : x*y,
        ]
    lf1 = [
        lambda x,y : x+y,
        ]
    l1=ps.CustomLibrary(library_functions=lf1,function_names=lf1,include_bias=True)

    lf2=[
        lambda x: A_T-x]
    lf2_names=[
        lambda x: 'A_T-'+x]
    l2=ps.CustomLibrary(library_functions=lf2, function_names=lf2_names, include_bias=False)

    lf3=[
        lambda x: P_T-x]
    lf3_names=[
        lambda x: 'P_T-'+x]
    l3=ps.CustomLibrary(library_functions=lf3, function_names=lf3_names, include_bias=False)

    lf4=[
        lambda x: G_T-x]
    lf4_names=[
        lambda x: 'G_T-'+x]
    l4=ps.CustomLibrary(library_functions=lf4, function_names=lf4_names, include_bias=False)

    lf5=[
        lambda x,y: E_T-x-y]
    lf5_names=[
        lambda x,y: 'E_T-'+x+'-'+y]
    l5=ps.CustomLibrary(library_functions=lf5, function_names=lf5_names, include_bias=False)

    inputs_temp=np.tile([0,1,2,3,4],6)
    inputs_per_library = np.reshape(inputs_temp, (6, 5))
    inputs_per_library[2, :] = 0 # Just use the u input for generating the library
    inputs_per_library[3, :] = 4 # Just use the y input for generating the library
    inputs_per_library[4, :] = 3 # Just use the x input for generating the library
    inputs_per_library[5, :3] = 3 # Just use the x and y input for generating the library
    tensor_array=[[1,0,1,0,0,0],[1,0,0,1,0,0], [1,0,0,0,1,0], [1,0,0,0,0,1], [0,0,1,1,0,0], [0,0,1,0,1,0], [0,0,1,0,0,1],
                  [0,0,0,1,1,0],[0,0,0,1,0,1], [0,0,0,0,1,1]]
    custom_library=ps.GeneralizedLibrary(libraries=[l0,l1,l2,l3,l4,l5],inputs_per_library=inputs_per_library, tensor_array=tensor_array)

The problem occurs when I try to run the ensemble method together with the generalized library as follows (x_noisy consists of 5 variables):

ensemble_optimizer = ps.STLSQ(threshold=0.03,alpha=lam)
custom_library=mass_conservation_library()
model = ps.SINDy(feature_names=feature_names, optimizer=ensemble_optimizer, feature_library=custom_library)
model.fit(x_noisy, t=dt, ensemble=True , n_models=n_models, quiet=True)

The error that I get is the following:

  File "XXX.py", line 348, in <module>
    model.fit(x_noisy_t, t=dt, ensemble=True , n_models=n_models, quiet=True)

  File "C:\XXX\Anaconda3\envs\mamc\lib\site-packages\pysindy\pysindy.py", line 471, in fit
    self.model.fit(x_ensemble, x_dot_ensemble)

  File "C:\XXX\Anaconda3\envs\mamc\lib\site-packages\sklearn\pipeline.py", line 390, in fit
    Xt = self._fit(X, y, **fit_params_steps)

  File "C:\XXX\Anaconda3\envs\mamc\lib\site-packages\sklearn\pipeline.py", line 348, in _fit
    X, fitted_transformer = fit_transform_one_cached(

  File "C:\XXX\Anaconda3\envs\mamc\lib\site-packages\joblib\memory.py", line 349, in __call__
    return self.func(*args, **kwargs)

  File "C:\XXX\Anaconda3\envs\mamc\lib\site-packages\sklearn\pipeline.py", line 893, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)

  File "C:\XXX\Anaconda3\envs\mamc\lib\site-packages\sklearn\base.py", line 855, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)

  File "C:\XXX\Anaconda3\envs\mamc\lib\site-packages\pysindy\feature_library\generalized_library.py", line 202, in fit
    fitted_libs = [

  File "C:\XXX\Anaconda3\envs\mamc\lib\site-packages\pysindy\feature_library\generalized_library.py", line 203, in <listcomp>
    lib.fit(x[:, np.unique(self.inputs_per_library_[i, :])], y)

IndexError: index 6 is out of bounds for axis 0 with size 6

The problem here is, that I tried to have a look at what is going on in the code, but changing something in there always breaks the GenerlizedLibrary script. Furthermore, I am confused that this did not happen when I ran the analysis with just the normal SINDy, because the library is the same for both. Maybe you can tell me if I am just doing something wrong, or what the issue could be. Thank you again in advance and I am looking forward to your answer,

Best wishes, Bartosz

akaptano commented 2 years ago

Hi Bartosz, glad you're using all the new functionality. What version of PySINDy are you using? We just corrected an ensembling-related bug from versions 1.5.1 -> 1.6.1

BMP-TUD commented 2 years ago

Thank you for quick response, I am using the newest version 1.6.1.

akaptano commented 2 years ago

Good, let me try to reproduce this now.

akaptano commented 2 years ago

Small oversight on our part. Looks like when ensemble=True, the library is fit each time, so it adds the tensored libraries the first time and then it is confused when the second call of fit happens. Fixed this and pushing the changes in a moment, so you'll need to get the latest version of the main branch. Let me know if you need anything else, and here is the silly example I used to debug:

import numpy as np
import pysindy as ps
from scipy.integrate import solve_ivp

def lorenz(t, x, sigma=10, beta=2.66667, rho=28):
    return [
        sigma * (x[1] - x[0]),
        x[0] * (rho - x[2]) - x[1],
        x[0] * x[1] - beta * x[2],
        0,
        0,
    ]
integrator_keywords = {}
integrator_keywords['rtol'] = 1e-12
integrator_keywords['method'] = 'LSODA'
integrator_keywords['atol'] = 1e-12

dt = .002
t_train = np.arange(0, 10, dt)
x0_train = [-8, 8, 27, 10, 4]
t_train_span = (t_train[0], t_train[-1])
x_train = solve_ivp(lorenz, t_train_span, x0_train, 
                    t_eval=t_train, **integrator_keywords).y.T

A_T=1
P_T=1
G_T=1
E_T=3

lf0 = [
lambda x : x,
    ]
lf0 = [
    lambda x : x,
    ]
l0=ps.CustomLibrary(library_functions=lf0,function_names=lf0,include_bias=False)

lf1 = [
lambda x,y : x*y,
    ]
lf1 = [
    lambda x,y : x+y,
    ]
l1=ps.CustomLibrary(library_functions=lf1,function_names=lf1,include_bias=True)

lf2=[
    lambda x: A_T-x]
lf2_names=[
    lambda x: 'A_T-'+x]
l2=ps.CustomLibrary(library_functions=lf2, function_names=lf2_names, include_bias=False)

lf3=[
    lambda x: P_T-x]
lf3_names=[
    lambda x: 'P_T-'+x]
l3=ps.CustomLibrary(library_functions=lf3, function_names=lf3_names, include_bias=False)

lf4=[
    lambda x: G_T-x]
lf4_names=[
    lambda x: 'G_T-'+x]
l4=ps.CustomLibrary(library_functions=lf4, function_names=lf4_names, include_bias=False)

lf5=[
    lambda x,y: E_T-x-y]
lf5_names=[
    lambda x,y: 'E_T-'+x+'-'+y]
l5=ps.CustomLibrary(library_functions=lf5, function_names=lf5_names, include_bias=False)

inputs_temp=np.tile([0,1,2,3,4],6)
inputs_per_library = np.reshape(inputs_temp, (6, 5))
inputs_per_library[2, :] = 0 # Just use the u input for generating the library
inputs_per_library[3, :] = 4 # Just use the y input for generating the library
inputs_per_library[4, :] = 3 # Just use the x input for generating the library
inputs_per_library[5, :3] = 3 # Just use the x and y input for generating the library
tensor_array=[[1,0,1,0,0,0],[1,0,0,1,0,0], [1,0,0,0,1,0], 
              [1,0,0,0,0,1], [0,0,1,1,0,0], [0,0,1,0,1,0], [0,0,1,0,0,1],
              [0,0,0,1,1,0],[0,0,0,1,0,1], [0,0,0,0,1,1]]
custom_library=ps.GeneralizedLibrary(libraries=[l0,l1,l2,l3,l4,l5],
                                     inputs_per_library=inputs_per_library, 
                                     tensor_array=tensor_array)
lam = 0 
n_models = 20
feature_names = ['x', 'y', 'z', 'v', 'w']
ensemble_optimizer = ps.STLSQ(threshold=0.03, alpha=lam)
model = ps.SINDy(feature_names=feature_names, 
                 optimizer=ensemble_optimizer, 
                 feature_library=custom_library)
model.fit(x_train, t=dt, ensemble=True , n_models=n_models, quiet=True)
model.print()
BMP-TUD commented 2 years ago

Hi and thank you for the fast response and fix of the issue, really appreciate it. 👍

BMP-TUD commented 2 years ago

Hi, sorry for the maybe misguiding answer before, I still appreciate your work ;) but: I tried it now with the new 1.6.2 push but it does not seem to work. I get the same error message. I compared the code for this right now and nothing changed. Is the push already through? Or has something gone wrong? Thanks, bw Bartosz

akaptano commented 2 years ago

Are you cloning the GitHub repo directly or something else? The 1.6.2 push did not have the update... there is another git commit after that, containing the fix. If you are not cloning the repo directly, let me know, and I will just make a v1.6.3 release so you can get it

BMP-TUD commented 2 years ago

I am normally using pip to manage my packages or conda if necessary. I can just git-clone and have it straight away, but maybe it would be good to have it accessible through pip or conda. Thank you for your help :)

akaptano commented 2 years ago

Just published a new release (v1.6.3) with the change. Let me know if that does not fix your issue!

znicolaou commented 2 years ago

Thanks for the progress on this, @akaptano ! It looks like the last push and release went part of the way to addressing this issue, but I had to add another line to the generalized library for the weak case (to set the number of samples to the number of domains times the number of trajectories instead of x.shape[0]). I pushed this change to the weak_optimization branch for now.

I think some iterations of multiple_trajectories and nonweak/weak libraries may still cause issues, and we probably should do a few tests still before pushing another release. @BMP-TUD, I suggest you clone the github repository and checkout the weak_optimization branch for now. You can use pip to install the developmental version as well, as in:

git clone https://github.com/dynamicslab/pysindy.git
cd pysindy
git checkout weak_optimization
pip install .

Since you're using the weak libraries, you will also benefit from faster fitting in the weak_library branch, but you should remove the num_pts_per_domain option when you declare the WeakPDELibrary, since it is being eliminated. See the pull request #153 , which I'm linking to this issue for now. Some time soon we will merge this with master and create a new release.

akaptano commented 2 years ago

@znicolaou I think you're right but @BMP-TUD if you aren't using the weak form stuff, I think you should be able to get away with using v1.6.3 for now.

znicolaou commented 2 years ago

Oops, had a typo in that last push that I fixed, so I pushed again... FYI I am also trying to use the generalized library with multiple trajectories to fit an integro-differential equation with the weak form library, so hopefully I can flesh out all these issues as I go.

BMP-TUD commented 2 years ago

Thank you both, I accessed the package via git clone. It works perfectly. :)

akaptano commented 2 years ago

Addressed this in the new release I believe. Closing this now.