[BUG] No consistency in model.fit()

ggmirandac commented 1 month ago

Hi,

I trying the software to use it for some projects, and I found a pretty weird bug in the code. For examples, if I have the following code:

tshol = 0.001
optimizer_stlsq = ps.STLSQ(threshold=tshol, alpha = 1)
optimizer_sr3 = ps.SR3(threshold=tshol, thresholder='l1')
ssr_optimizer = ps.SSR(alpha=0.5, criteria="model_residual", kappa=1)

dif = ps.SINDyDerivative(kind='kalman', alpha = 0.1)

feature_library = ps.PolynomialLibrary(degree=2)
model = ps.SINDy(optimizer=ssr_optimizer, feature_library=feature_library, differentiation_method=dif,
                 feature_names=['s1', 's2', 's3', 'm1', 'm2'])
model.fit(sindy_data_list, t=dt, multiple_trajectories=True) # Fit the model
model.print() # Print the model

And then I plot the model as:

test_x0 = np.array([0.9,0.2,0.9,0.9,0.1])
test_t_span = (0, 24)
dt = 1
test_t_eval = np.arange(0, 24, dt)
test_sol = spi.solve_ivp(CR, test_t_span, test_x0, args = (ns, nm, C, g, P), t_eval = test_t_eval)
simulation = model.simulate(test_x0, test_t_eval, integrator_kws={'atol': 1e-6, 'method': 'BDF', 'rtol': 1e-6},)

plt.plot(simulation)

I get the following plot:

And the next equation:

But If I run the cells again (considering that I am running this in a jupyter notebook). The code improves and get:

And the next equation:

Which is weird, because in theory rerunning the cells should change anythin.

Does anyone knows what could be happening?

PySINDy: 1.7.5 Python: 3.10.21

Jacob-Stevens-Haas commented 4 weeks ago

What could be happening is that optimizers could be reseeded with the most recent solution, similar to a warm-start in some optimization routines. But it's not clear exactly what you're running. E.g. if you run all the above code again, it creates an entirely new SSR object.

It might help if you can produce top-to-bottom code that shows what you're seeing.

ggmirandac commented 4 weeks ago

Yeah sure, how is it easier for you to see it? Like a file in a repo or should I attach it here?

Bests,

Jacob-Stevens-Haas commented 3 weeks ago

Reduce to the minimum code needed to demonstrate your problem, then post it here.

ggmirandac commented 3 weeks ago

Hi, sorry for the delay.

Here is a mini example of the issue:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pysindy as ps
import scipy.integrate as spi
import scipy.stats as spstats
import gurobipy
run_miosr = True
GurobiError = gurobipy.GurobiError

def gLV(t, x, ns, A, r):
    s = x[0:ns]
    dsdt = s * (r + np.dot(A, s))    
    return dsdt

def surrogate_dydt(t, y):
    _y = y[np.newaxis,:]
    return model.predict(x=_y)

# Generate the g vector it is 3x1
A = np.array([[-1, 0, -1], [0, -1, .1], [0, 0, -1]])
r = np.array([1, 1, 1])
x0 = np.array([0.5, 0.5, 0.5])
ns = len(x0)

# Solve the system
t_span = (0, 24)
dt = 20
t_eval = np.linspace(t_span[0], t_span[-1], dt)
n_reps = 100
sindy_data_list = []
x0_list = []
times_shape = []
for i in range(n_reps):
    x0 = np.random.dirichlet(np.ones(ns))
    x0_list.append(x0)
    sol = spi.solve_ivp(gLV, t_span, x0, args = (ns, A, r), t_eval = t_eval)
    simu = sol.y.T + np.random.normal(0, 0.1, sol.y.T.shape)
    sindy_data_list.append(simu)

plt.plot(sol.t, sol.y.T)

tshol = 0.001
optimizer_stlsq = ps.STLSQ(threshold=.01, alpha=.5)
optimizer_sr3 = ps.SR3(threshold=tshol, thresholder='l1')
ssr_optimizer = ps.SSR(alpha=0.5, criteria="model_residual", kappa=1)
# initialise the Sparse bayesian Regression optimizer.

dif = ps.SmoothedFiniteDifference()
feature_library = ps.PolynomialLibrary(degree=2)

model = ps.SINDy(optimizer=ssr_optimizer, feature_library=feature_library, differentiation_method=dif,
                feature_names=['s1', 's2', 's3', 'm1', 'm2'])
#model.fit(sindy_data_list, t=dt, multiple_trajectories=True) # Fit the model
model.fit(sindy_data_list, t=dt)
model.print() # Print the model

# set up a new differential equation that uses the Bayesian SINDy predictions.

# Solve the system
x0 = np.array([0.3,0.3,0.3])
t_span = (0, 24)
dt = 100
t_eval2 = np.linspace(t_span[0], t_span[-1], dt)
sol = spi.solve_ivp(surrogate_dydt, t_span, x0, t_eval = t_eval2)

fig, ax = plt.subplots(1,2, dpi = 600, figsize = (10,5))
ax[0].plot(sol.t, sol.y.T)

# simulation
simu = spi.solve_ivp(gLV, t_span, x0, t_eval = t_eval2, args = (ns, A, r))
ax[1].plot(simu.t, simu.y.T)

tshol = 0.001
optimizer_stlsq = ps.STLSQ(threshold=.01, alpha=.5)
optimizer_sr3 = ps.SR3(threshold=tshol, thresholder='l1')
ssr_optimizer = ps.SSR(alpha=0.5, criteria="model_residual", kappa=1)
# initialise the Sparse bayesian Regression optimizer.

dif = ps.SmoothedFiniteDifference()
feature_library = ps.PolynomialLibrary(degree=2)

model = ps.SINDy(optimizer=ssr_optimizer, feature_library=feature_library, differentiation_method=dif,
                feature_names=['s1', 's2', 's3', 'm1', 'm2'])
#model.fit(sindy_data_list, t=dt, multiple_trajectories=True) # Fit the model
model.fit(sindy_data_list, t=t_eval)
model.print() # Print the model

Jacob-Stevens-Haas commented 3 weeks ago

IIUC, the problem is that the two model.print() statements do not show the same model?

ggmirandac commented 3 weeks ago

yep yep

Jacob-Stevens-Haas commented 3 weeks ago

Then there is a lot of that code you can remove to make the example more minimal. Doing so shows the calls to model.fit() are slightly different:

model.fit(sindy_data_list, t=dt)
model.fit(sindy_data_list, t=t_eval)

If you look at how dt and t_eval differ, it's here:

t_eval = np.linspace(t_span[0], t_span[-1], dt)

The problem is that linspace doesn't take a dt argument. Abbreviated signature:

numpy.linspace(start, stop, num=50, ...)

You're looking for numpy.arange, which does take a step argument.

ggmirandac commented 2 weeks ago

Ohhhh thanks for the help now I get it.

dynamicslab / pysindy

[BUG] No consistency in model.fit() #518