aeye-lab / pymovements

A python package for processing eye movement data
https://pymovements.readthedocs.io
MIT License
61 stars 12 forks source link

n_components won't be filled when initializing GazeDataFrame with nested columns #514

Closed dkrako closed 1 year ago

dkrako commented 1 year ago

Current Behavior

the GazeDataFrame.n_components attribute won't be filled correctly if all the columns in the input dataframe are already nested.

Expected Behavior

GazeDataFrame.n_components should be inferred from existing component columns in the input dataframe.

Minimum acceptance criteria

Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

Steps to Reproduce

import polars as pl
import pymovements as pm

df_orig = pl.from_numpy(np.zeros((2, 1000)), orient='col', schema=['x', 'y'])
df_orig

Out:
shape: (1_000, 2)
┌─────┬─────┐
│ x   ┆ y   │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 │
│ …   ┆ …   │
│ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 │
└─────┴─────┘

gaze = pm.GazeDataFrame(df_orig, position_columns=['x', 'y'])
gaze.frame

Out:
shape: (1_000, 1)
┌────────────┐
│ position   │
│ ---        │
│ list[f64]  │
╞════════════╡
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ …          │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
└────────────┘

df_copy = gaze.frame.clone()
gaze_copy = pm.GazeDataFrame(df_copy)
gaze_copy.frame

Out: 
shape: (1_000, 1)
┌────────────┐
│ position   │
│ ---        │
│ list[f64]  │
╞════════════╡
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ …          │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
│ [0.0, 0.0] │
└────────────┘

print(gaze.n_components, gaze_copy.n_components)

Out:
2 None

Both gaze.n_components and gaze_copy.n_components should actually be 2.

dkrako commented 1 year ago

should be resolved before starting #518

dkrako commented 1 year ago

The inference of n_components is implemented in GazeDataFrame.__init__() in a very hacky way and should be refactored.

Instead of assigning in each if-branch, we should call a new method like this

self.n_components = self._infer_n_components()

at the end of the init which would return the n_components value inferred from self.frame.

In GazeDataFrame._infer_n_components() we should create a set of all the list lengths in the pixel, position, velocity and acceleration column. If the GazeDataFrame was initialized correctly, then the size of the set should be exactly 1 (or 0 if empty).

If the set is larger, then there would be inconsistencies in the dataframe component columns. We should also raise an error then.