dynamicslab / pysindy

A package for the sparse identification of nonlinear dynamical systems from data
https://pysindy.readthedocs.io/en/latest/
Other
1.42k stars 310 forks source link

Make `multiple_trajectories` implicit. #353

Closed Jacob-Stevens-Haas closed 1 year ago

Jacob-Stevens-Haas commented 1 year ago

Is your feature request related to a problem? Please describe.

Sorry for yet another suggestion, but this one is super simple:

I'm wondering whether the multiple_trajectories kwarg to SINDy.fit() is unneccessary. I'm all about the Zen of Python, which contains the dictum: "Explicit is better than implicit". But the docstring is clear, and since all calls to SINDy.fit() now begin with _adapt_to_multiple_trajectories(), there's a natural place for input validation and guard code. It's like we have an extra argument just to ask the user "did you really mean to call fit() with these args?" And we've definitely gotten GH issues somewhat regularly where we need to explain the argument.

Describe the solution you'd like

With the removal of ensembling legacy code (I'll PR that in the next day, probably), removing multiple_trajectories would leave

# super clean
def fit(self, x, t=None, x_dot=None, u=None):

Describe alternatives you've considered

Not doing this.

Additional context

To see why this arg feels weird, consider:

numpy.mean(arr, axis=(1,2))
# versus
numpy.mean(arr, axis=(1,2), multiple_axes=True)
znicolaou commented 1 year ago

Totally agree this should be simplified. Currently, if multiple_trajectories=True, the x and u must be python lists (rather than numpy arrays) and the list index distinguishes the trajectories, but I think that is hacky. It would be better to add a new axis to AxesArray for the trajectory index and to have a default way to interpret axes if the input is a numpy array or list (such as assuming there is only one trajectory, the feature axis is last, the time axis is the second to last, and the remaining axes are spatial). Then, to input multiple trajectories, the user would just need to pass AxesArray's for x and u with a labeled trajectory axis.

Jacob-Stevens-Haas commented 1 year ago

Part of the problem with a trajectory axis is that trajectories don't need to be compatible sizes except for PDE problems. Numpy doesn't have a ragged array capability.

znicolaou commented 1 year ago

Yes, you're right (and on that point, it is really inappropriate that the spatial grids need to be identical for PDEs, e.g. #222). The only issue I see with interpreting the number of trajectories based on whether x and/or u are lists alone is that some users may not know that they have to cast the trajectories input to numpy arrays for it to be interpreted correctly.

Jacob-Stevens-Haas commented 1 year ago

they have to cast the trajectories input to numpy arrays for it to be interpreted correctly

The docstring seems pretty clear on this. What would the trajectories inputs be if not numpy arrays?

znicolaou commented 1 year ago

The potentially dangerous situation is if there is only one trajectory but they pass the data as a python list rather than a numpy array. For example, if they generate the data in a python loop by appending without casting it to an array afterwards, like

xs=[]
x = 0.5
for n in range(N):
    xs = xs+[x]
    x = r * x * (1 - x)
model.fit(xs)

The input would be potentially by interpreted as N trajectories rather than a single trajectory of N timesteps. The situation is even more of a headache if each trajectory is spatiotemporal and has more than one feature...

Jacob-Stevens-Haas commented 1 year ago

Yeah, but I'd consider it as a design-by-contract exception. If the docstring is clear and they submit bad data, we don't need a parameter for them to just confirm that they've submitted the data correctly. At least with spatiotemporal trajectories, they have to submit an argument for the spatiotemporal grid. That would cause an error if we misinterpreted the shape of the data.

FWIW, numpy works the same way. If I run

arrs = [np.arange(4*i, 4*i+4).reshape((2,2)) for i in range(2)]
np.transpose(arrs)

Numpy assumes that if you submit a list of arrays, the list is the first axis of a combined array. Likewise, I suggest that if we receive a list of arrays, we assume the list is a list of trajectories (given that it's documented as such).