dynamicslab / pysindy

A package for the sparse identification of nonlinear dynamical systems from data
https://pysindy.readthedocs.io/en/latest/
Other
1.42k stars 310 forks source link

[BUG] `PolynomialLibrary.fit()` with CSR matrix does not obey `include_interaction` #367

Closed Jacob-Stevens-Haas closed 1 year ago

Jacob-Stevens-Haas commented 1 year ago

Reproducing code example:

import numpy as np
import scipy.sparse as sp
from pysindy import PolynomialLibrary

x = sp.csr_matrix([[2,3]])
lib = PolynomialLibrary(degree=2, include_bias=False, include_interaction=False)
y = lib.fit_transform(x)

expected = np.array([[2,3,4,9]])
result = y.toarray()
print(result)
array([[2, 3, 4, 6, 9]])

Notes:

Found while troubleshooting #365

Our PolynomialLibrary differs from scikit-learn's PolynomialFeatures only in allowing the include_interaction parameter. However, in order to take advantage of fast polynomial features on sparse inputs, we were calling their compiled cython function _csr_polynomial_expansion(). However, that doesn't allow an include_interaction parameter.

The solution is to treat CSR matrices just like any other matrices. It won't be as fast in the extremely limited case where we're using a CSR matrix, low-degree polynomials, and include_interactions=True. See #223. I will also modify test_polynomial_options to check that library.powers_ matches the shape from fit_transform()

PySINDy/Python version information:

master and all recent releases, but requires scikit-learn<1.3 or else another error appears.