aeye-lab / pymovements

A python package for processing eye movement data
https://pymovements.readthedocs.io
MIT License
57 stars 11 forks source link

make passing time array in gaze.from_numpy() more robust #640

Open dkrako opened 6 months ago

dkrako commented 6 months ago

Description of the problem

634 has exposed an issue with the way we pass the time numpy array to polars.from_numpy().

Up to now we supported 1d and 2d array for passing timesteps. But the latest polars version now doesn't seem to support passing a 2d array to from_numpy (on looking closely it is somehow recognized as an array of objects instead of a 2d array).

Description of a solution

I can think of three solutions.

  1. It could be that this gets simply resolved by a new polars version. Then we wouldn't need to do much work actually and just wait until it gets resolved upstream. I'm just unsure if this will then lead to a series of type pl.List[float] instead of just pl.Float which could lead to a problem later on for < and > comparisons.

  2. Stop supporting passing 2d arrays for the timesteps. As a user would still get a polars error it would be better to at least raise an exception in pymovements after checking the shape of the time array.

  3. Flatten time array to a 1d array. This should be done in this code block before passing it to pl.from_numpy(). https://github.com/aeye-lab/pymovements/blob/a9edc9dada4bf13ab13f034429570cb942dce3be/src/pymovements/gaze/integration.py#L231-L234 Before flattening the time array, we must make sure that all dimensions have a size of 1, except for a single dimension which is the actual timeseries axis, i.e. a shape of (1, 10) or even (10, 1, 1) should be excepted, but a shape of (2, 10) should raise an error.

I personally prefer solution 3, but as this usecase is so obscure and can be resolved by the user themselves, I think solution 2 is also viable. Solution 1 is our default until then.

Minimum acceptance criteria