ValueError: operands could not be broadcast together with shapes (3,) (10,) when I tried to use SINDyPI optimizer on the Lorentz equation

ouslalu commented 10 months ago

Reproducing code example:

import pysindy
feature_names = ['x', 'y', 'z']
ensemble_optimizer = ps.STLSQ()

sindy_pi_optimizer = ps.SINDyPI(threshold = 0.1,
                                fit_intercept = True, verbose_cvxpy = False)

library_functions = [lambda x: x, lambda x: x * x, lambda x, y: x * y]
library_function_names = [lambda x: x, lambda x: x + x, lambda x, y: x + y]

ode_lib = ps.WeakPDELibrary(
    library_functions=library_functions,
    function_names=library_function_names,
    spatiotemporal_grid=t_train,
    include_bias=True,
    is_uniform=True,
    K=10000,
)

#ps.
model = ps.SINDy(feature_library=ode_lib, 
                 feature_names=feature_names, 
                 optimizer = sindy_pi_optimizer)
# Ensemble with replacement (V1)
#nsemble=True, n_models=1000, quiet=True, library_ensemble= True
model.fit(x_train, t=dt)
model.print()

Error message:

47 model.fit(x_train, t=dt) 48 model.print() File ~/anaconda3/lib/python3.10/site-packages/pysindy/pysindy.py:414, in SINDy.fit(self, x, t, x_dot, u, multiple_trajectories, unbias, quiet, ensemble, library_ensemble, replace, n_candidates_to_drop, n_subset, n_models, ensemble_aggregator) 412 warnings.filterwarnings(action, category=LinAlgWarning) 413 warnings.filterwarnings(action, category=UserWarning) --> 414 self.model.fit(x, x_dot) 416 # New version of sklearn changes attribute name 417 if float(__version__[:3]) >= 1.0: File ~/anaconda3/lib/python3.10/site-packages/sklearn/pipeline.py:405, in Pipeline.fit(self, X, y, **fit_params) 403 if self._final_estimator != "passthrough": 404 fit_params_last_step = fit_params_steps[self.steps[-1][0]] --> 405 self._final_estimator.fit(Xt, y, **fit_params_last_step) 407 return self File ~/anaconda3/lib/python3.10/site-packages/pysindy/optimizers/sindy_optimizer.py:60, in SINDyOptimizer.fit(self, x, y) 53 def fit(self, x, y): 55 x, y = drop_nan_samples( 56 AxesArray(x, {"ax_sample": 0, "ax_coord": 1}), 57 AxesArray(y, {"ax_sample": 0, "ax_coord": 1}), 58 ) ---> 60 self.optimizer.fit(x, y) 61 if not hasattr(self.optimizer, "coef_"): 62 raise AttributeError("optimizer has no attribute coef_") File ~/anaconda3/lib/python3.10/site-packages/pysindy/optimizers/base.py:189, in BaseOptimizer.fit(self, x_, y, sample_weight, **reduce_kws) 186 for i in range(np.shape(self.history_)[0]): 187 self.history_[i] = np.multiply(reg, self.history_[i]) --> 189 self._set_intercept(X_offset, y_offset, X_scale) 190 return self File ~/anaconda3/lib/python3.10/site-packages/sklearn/linear_model/_base.py:362, in LinearModel._set_intercept(self, X_offset, y_offset, X_scale) 358 if self.fit_intercept: 359 # We always want coef_.dtype=X.dtype. For instance, X.dtype can differ from 360 # coef_.dtype if warm_start=True. 361 self.coef_ = np.divide(self.coef_, X_scale, dtype=X_scale.dtype) --> 362 self.intercept_ = y_offset - np.dot(X_offset, self.coef_.T) 363 else: 364 self.intercept_ = 0.0 ValueError: operands could not be broadcast together with shapes (3,) (10,) -->

PySINDy/Python version information:

Jacob-Stevens-Haas commented 10 months ago

There's no data here, or a pysindy version. Please fill in the version info and add a small amount of data simple data we can use as x_train and dt to demonstrate the error.

ouslalu commented 10 months ago

The data can be generated by this code:

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from scipy.integrate import solve_ivp
from sklearn.metrics import mean_squared_error
from pysindy.utils.odes import lorenz

#Ignore integration and solver convergence warnings

import warnings
from scipy.integrate.odepack import ODEintWarning
warnings.simplefilter("ignore", category=UserWarning)
warnings.simplefilter("ignore", category=ODEintWarning)

import pysindy as ps

#Seed the random number generators for reproducibility
np.random.seed(100)

integration keywords for solve_ivp, typically needed for chaotic systems
integrator_keywords = {}
integrator_keywords['rtol'] = 1e-12
integrator_keywords['method'] = 'LSODA'
integrator_keywords['atol'] = 1e-12

dt = .002
t_train = np.arange(0, 10, dt)
t_train_span = (t_train[0], t_train[-1])
x0_train = [-8, 8, 27]
x_train = solve_ivp(lorenz, t_train_span, x0_train, 
                    t_eval=t_train, **integrator_keywords).y.T

# add 1% noise to add a little complexity 
# (otherwise all the models are basically correct)
rmse = mean_squared_error(x_train, np.zeros(x_train.shape), squared=False)
x_train = x_train + np.random.normal(0, rmse/100, x_train.shape)

# Evolve the Lorenz equations in time using a different initial condition
t_test = np.arange(0, 15, dt)
t_test_span = (t_test[0], t_test[-1])
x0_test = np.array([8, 7, 15])
x_test = solve_ivp(lorenz, t_test_span, x0_test, 
                   t_eval=t_test, **integrator_keywords).y.T

pysindy version: 1.7.5

Jacob-Stevens-Haas commented 10 months ago

This appears to be an issue with SINDyPI(fit_intercept=True) with multiple targets (in this case, x, y, and z). Setting the intercept runs the code:

self.intercept_ = y_offset - np.dot(X_offset, self.coef_.T)

In a non-PI optimizer, self.coef_ has the same number of rows as there are targets, which is the same shape as y_offset. However, in SINDyPI, self.coef_ has the same number of rows as there are models, so the subtraction only broadcasts when there is only one target.

In master we've removed fit_intercept from all optimizers (Making it strictly the job of the function library to provide a constant term). master is preparing for a backwards-incompatible release, so we probably won't release a 1.7.6. In the interim, you can set fit_intercept=False.

ouslalu commented 10 months ago

Thank you. This solves the problem.

dynamicslab / pysindy