Closed Jacob-Stevens-Haas closed 1 year ago
Totally agree this should be simplified. Currently, if multiple_trajectories=True
, the x
and u
must be python lists (rather than numpy arrays) and the list index distinguishes the trajectories, but I think that is hacky. It would be better to add a new axis to AxesArray
for the trajectory index and to have a default way to interpret axes if the input is a numpy array or list (such as assuming there is only one trajectory, the feature axis is last, the time axis is the second to last, and the remaining axes are spatial). Then, to input multiple trajectories, the user would just need to pass AxesArray
's for x
and u
with a labeled trajectory axis.
Part of the problem with a trajectory axis is that trajectories don't need to be compatible sizes except for PDE problems. Numpy doesn't have a ragged array capability.
Yes, you're right (and on that point, it is really inappropriate that the spatial grids need to be identical for PDEs, e.g. #222). The only issue I see with interpreting the number of trajectories based on whether x
and/or u
are lists alone is that some users may not know that they have to cast the trajectories input to numpy arrays for it to be interpreted correctly.
they have to cast the trajectories input to numpy arrays for it to be interpreted correctly
The docstring seems pretty clear on this. What would the trajectories inputs be if not numpy arrays?
The potentially dangerous situation is if there is only one trajectory but they pass the data as a python list rather than a numpy array. For example, if they generate the data in a python loop by appending without casting it to an array afterwards, like
xs=[]
x = 0.5
for n in range(N):
xs = xs+[x]
x = r * x * (1 - x)
model.fit(xs)
The input would be potentially by interpreted as N
trajectories rather than a single trajectory of N
timesteps. The situation is even more of a headache if each trajectory is spatiotemporal and has more than one feature...
Yeah, but I'd consider it as a design-by-contract exception. If the docstring is clear and they submit bad data, we don't need a parameter for them to just confirm that they've submitted the data correctly. At least with spatiotemporal trajectories, they have to submit an argument for the spatiotemporal grid. That would cause an error if we misinterpreted the shape of the data.
FWIW, numpy
works the same way. If I run
arrs = [np.arange(4*i, 4*i+4).reshape((2,2)) for i in range(2)]
np.transpose(arrs)
Numpy assumes that if you submit a list of arrays, the list is the first axis of a combined array. Likewise, I suggest that if we receive a list of arrays, we assume the list is a list of trajectories (given that it's documented as such).
Is your feature request related to a problem? Please describe.
Sorry for yet another suggestion, but this one is super simple:
I'm wondering whether the
multiple_trajectories
kwarg toSINDy.fit()
is unneccessary. I'm all about the Zen of Python, which contains the dictum: "Explicit is better than implicit". But the docstring is clear, and since all calls toSINDy.fit()
now begin with_adapt_to_multiple_trajectories()
, there's a natural place for input validation and guard code. It's like we have an extra argument just to ask the user "did you really mean to callfit()
with these args?" And we've definitely gotten GH issues somewhat regularly where we need to explain the argument.Describe the solution you'd like
With the removal of ensembling legacy code (I'll PR that in the next day, probably), removing
multiple_trajectories
would leaveDescribe alternatives you've considered
Not doing this.
Additional context
To see why this arg feels weird, consider: