`np.median(model.coef_list, axis=0)` does not give the same answer as `model.print()` from first example in ensembling documentation

scatr commented 5 months ago

I've been working through some of the ensemble examples in the documentation where it's stated that model.print() returns the median of the coefficients in the ensemble coefficient list. When I try to median the coefficients myself, I do not get the same answer. Am I making a mistake in how the median is taken or from the subset of models it's taken? Thanks in advance.

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from scipy.integrate import solve_ivp
from sklearn.metrics import mean_squared_error
from pysindy.utils.odes import lorenz

# Ignore integration and solver convergence warnings
import warnings
from scipy.integrate.odepack import ODEintWarning
warnings.simplefilter("ignore", category=UserWarning)
warnings.simplefilter("ignore", category=ODEintWarning)

import pysindy as ps

# Seed the random number generators for reproducibility
np.random.seed(100)

# integration keywords for solve_ivp, typically needed for chaotic systems
integrator_keywords = {}
integrator_keywords['rtol'] = 1e-12
integrator_keywords['method'] = 'LSODA'
integrator_keywords['atol'] = 1e-12

dt = .002
t_train = np.arange(0, 10, dt)
t_train_span = (t_train[0], t_train[-1])
x0_train = [-8, 8, 27]
x_train = solve_ivp(lorenz, t_train_span, x0_train,
                    t_eval=t_train, **integrator_keywords).y.T

# add 1% noise to add a little complexity
# (otherwise all the models are basically correct)
rmse = mean_squared_error(x_train, np.zeros(x_train.shape), squared=False)
x_train = x_train + np.random.normal(0, rmse / 100.0, x_train.shape)
x_dot_train = ps.FiniteDifference()._differentiate(x_train, t_train)

# Evolve the Lorenz equations in time using a different initial condition
t_test = np.arange(0, 15, dt)
t_test_span = (t_test[0], t_test[-1])
x0_test = np.array([8, 7, 15])
x_test = solve_ivp(lorenz, t_test_span, x0_test,
                   t_eval=t_test, **integrator_keywords).y.T

# Instantiate and fit the SINDy model
feature_names = ['x', 'y', 'z']
ensemble_optimizer = ps.STLSQ()
model = ps.SINDy(feature_names=feature_names, optimizer=ensemble_optimizer)
# Ensemble with replacement (V1)
model.fit(x_train, t=dt, ensemble=True, quiet=True, n_models=20)
print("model.print()")
model.print()

ensemble_coefs = np.median(model.coef_list, axis=0)
ensemble_mean_coefs = np.mean(model.coef_list, axis=0)

model.coefficients()[:] = ensemble_coefs
print("model print with median coefficients")
model.print()
model.coefficients()[:] = ensemble_mean_coefs
print("model print with mean coefficients")
model.print()

Here's the printout I obtained

model.print()
(x)' = -0.046 1 + -9.937 x + 9.945 y
(y)' = 0.126 1 + 27.773 x + -0.935 y + -0.995 x z
(z)' = -0.109 1 + -2.654 z + 0.999 x y
model print with median coefficients
(x)' = 0.169 1 + -9.916 x + 9.935 y
(y)' = -0.060 1 + 27.709 x + -0.974 y + -0.989 x z
(z)' = -0.408 1 + -2.637 z + 1.000 x y
model print with mean coefficients
(x)' = 0.161 1 + -9.894 x + 9.872 y + 0.002 z
(y)' = -0.227 1 + 27.465 x + -0.836 y + 0.023 z + -0.988 x z
(z)' = -0.452 1 + -0.085 x + 0.056 y + -2.643 z + 0.998 x y

Jacob-Stevens-Haas commented 5 months ago

Two things:

The final step of any optimizer that allows it is unbiasing (running an unregularized regression with all zero terms dropped out)
We no longer use ensembling kwargs to model.fit. We now have an optimizer wrapper called EnsembleOptimizer. Try:
```
ensemble_optimizer = ps.EnsembleOptimizer(ps.STLSQ(), unbias=False)
```

Jacob-Stevens-Haas commented 4 months ago

Closing this - lmk if my answer didn't resolve and I'll reopen.

dynamicslab / pysindy

`np.median(model.coef_list, axis=0)` does not give the same answer as `model.print()` from first example in ensembling documentation #469