Closed milancurcic closed 1 year ago
I would say it can be Xarray Datarray, Akward array, numpy array? In cases other than the nested awkward array we need to think of the way by which the function is made aware of trajectory breaks with an extra argument perhaps? As an example the 'id' or the 'rowsize' variables? This method will be generalizable to any other function we will write. We can go back to the EarthCube notebook to see how this was handled.
Could any of our analysis functions could take optional arguments specifying the underlying structure of the data? I.e. option rowsize=rowsize
for ragged arrays and dim='obs
or axis=n
for structured arrays such as xarray DataArrays?
I like the rowsize
approach.
If rowsize
(array-like of ints, optional, default None) is absent, the computation defaults to the N-d structured array implementation.
If rowsize
is provided, require that x
, y
, and time
are 1-d arrays and apply boundary conditions at the start and end of each segment.
dim
argument doesn't apply since our function tasks array-likes (I think this is a good design choice), and we already have time_axis
to specify along which axis to differentiate.
If rowsize
is provided, we can raise a warning if time_axis
is also provided, and proceed with the computation (if we want to be more lax), or raise an error if we want to be more strict. Either is OK, it's a style choice.
@selipot and I discussed jLab's splitcell
(IIRC) function which splits a ragged array into a list of varying-length arrays.
That's easy for us to do as well given that we have rowsize
for housekeeping. This made me think of an alternative approach to handling ragged array in velocity_from_position
and similar functions that need to be trajectory-bounds aware:
The function could accept, in addition to array-likes, lists of array-likes. If the arguments are lists of array-likes, their elements are assumed to be contiguous arrays (trajectories), and the function would recursively run itself on each element and return lists of array-likes as result.
The downside of this approach is that (I think) it would require a copy of the data in the process (ragged array DataArray
-> list[DataArray]
). Another downside is that it would place the opportunity-to-parallelize inside the function (so it moves the parallelization responsibility from the user to the library), rather than to the user. The upside is that the implementation would be very easy.
Another approach is to not touch the existing function but implement a nicely syntaxed (nevermind, this is the same as above)splitcell
and let the user do velocity_from_position(splitcell(x), splitcell(y), splitcell(time))
and similar.
This is done with apply_ragged
.
Previous discussion in #68.
@selipot suggested that
velocity_from_position
should also handle ragged arrays as input. Let's discuss here what these ragged arrays look like. I.e. is the ragged array in the form of an xarray Dataset as generated by clouddrift or something else?