dynamicslab / pysindy

A package for the sparse identification of nonlinear dynamical systems from data
https://pysindy.readthedocs.io/en/latest/
Other
1.41k stars 309 forks source link

Use case for sparse input to PolynomialLibrary? #223

Closed Jacob-Stevens-Haas closed 2 years ago

Jacob-Stevens-Haas commented 2 years ago

Hey @kpchamp , we were refactoring a lot of pysindy and built some infrastructure to maintain backwards compatibility around sparse inputs. We're looking to provide documentation as to why they're useful, and git blame shows that you probably added sparse functionality? I'm not sure of a time series that is zero in most cases, but maybe a non-dynamical system use case was calculating the LHS of the regression in order to leverage the SINDy optimizers without the time-differentiation step?

-Jake

The relevant docstring to PolynomialLibrary.transform

    @x_sequence_or_item
    def transform(self, x_full):
        """Transform data to polynomial features.

        Parameters
        ----------
        x : array-like or CSR/CSC sparse matrix, shape (n_samples, n_features)
            The data to transform, row by row.
            Prefer CSR over CSC for sparse input (for speed), but CSC is
            required if the degree is 4 or higher. If the degree is less than
            4 and the input format is CSC, it will be converted to CSR, have
            its polynomial features generated, then converted back to CSC.
            If the degree is 2 or 3, the method described in "Leveraging
            Sparsity to Speed Up Polynomial Feature Expansions of CSR Matrices
            Using K-Simplex Numbers" by Andrew Nystrom and John Hughes is
            used, which is much faster than the method used on CSC input. For
            this reason, a CSC input will be converted to CSR, and the output
            will be converted back to CSC prior to being returned, hence the
            preference of CSR.

        Returns
        -------
        xp : np.ndarray or CSR/CSC sparse matrix,
                shape (n_samples, n_output_features)
            The matrix of features, where n_output_features is the number
            of polynomial features generated from the combination of inputs.
        """
kpchamp commented 2 years ago

Hey Jake, I think the only reason this has sparse functionality is that it was a modification of sklearn.preprocessing.PolynomialFeatures, which has that functionality. I don’t believe there was any specific use case in mind.

On Sun, Jul 3, 2022 at 3:49 PM Jacob Stevens-Haas @.***> wrote:

Hey @kpchamp https://github.com/kpchamp , we were refactoring a lot of pysindy and built some infrastructure to maintain backwards compatibility around sparse inputs. We're looking to provide documentation as to why they're useful, and git blame shows that you probably added sparse functionality? I'm not sure of a time series that is zero in most cases, but maybe a non-dynamical system use case was calculating the LHS of the regression in order to leverage the SINDy optimizers without the time-differentiation step?

-Jake

The relevant docstring to PolynomialLibrary.transform

@x_sequence_or_item
def transform(self, x_full):
    """Transform data to polynomial features.

    Parameters
    ----------
    x : array-like or CSR/CSC sparse matrix, shape (n_samples, n_features)
        The data to transform, row by row.
        Prefer CSR over CSC for sparse input (for speed), but CSC is
        required if the degree is 4 or higher. If the degree is less than
        4 and the input format is CSC, it will be converted to CSR, have
        its polynomial features generated, then converted back to CSC.
        If the degree is 2 or 3, the method described in "Leveraging
        Sparsity to Speed Up Polynomial Feature Expansions of CSR Matrices
        Using K-Simplex Numbers" by Andrew Nystrom and John Hughes is
        used, which is much faster than the method used on CSC input. For
        this reason, a CSC input will be converted to CSR, and the output
        will be converted back to CSC prior to being returned, hence the
        preference of CSR.

    Returns
    -------
    xp : np.ndarray or CSR/CSC sparse matrix,
            shape (n_samples, n_output_features)
        The matrix of features, where n_output_features is the number
        of polynomial features generated from the combination of inputs.
    """

— Reply to this email directly, view it on GitHub https://github.com/dynamicslab/pysindy/issues/223, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADU3SIXAPJOOIKW25AB6IPTVSIKGLANCNFSM52RMJMDA . You are receiving this because you were mentioned.Message ID: @.***>

akaptano commented 2 years ago

@Jacob-Stevens-Haas Feel free to delete this part of the code, push the fix to main, and close this out, when you have a good time. :)