aeye-lab / pymovements

A python package for processing eye movement data
https://pymovements.readthedocs.io
MIT License
57 stars 11 forks source link

auto column detection when initializing gaze #715

Open dkrako opened 2 months ago

dkrako commented 2 months ago

Description of the problem

When working with local datasets we have to setup a bunch of different column names in the DatasetDefinition. It would be much nicer if standard column namings could be inferred.

Description of a solution

In some cases column names could be guessed. As a starting point we could reuse our internal standard for preprocessed files:

https://github.com/aeye-lab/pymovements/blob/cb9ef9571c5b24f7609928d18efc3ae2520c1d03/src/pymovements/dataset/dataset_files.py#L283-L296

This would also very much simplify #714 as there's no need for an auto_nest argument then.

So I would propose to add a new argument to the init, for instance:

class GazeDataFrame:
    def __init__(
        ...
         auto_column_detect: bool = False,
        ...
    ):

The default could also be set to True but that could be a breaking change so I'm ambivalent.

For each attribute, for example pixel, we would then write something like this:

 component_suffixes = ['x', 'y', 'xl', 'yl', 'xr', 'yr', 'xa', 'ya'] 

if auto_column_detect and pixel_columns is None:  # I would vote for not overwriting specified columns
   column_canditates = ['pixel_' + suffix for suffix in component_suffixes] 
   pixel_columns = [c for c in column_canditates if c in gaze_df.frame.columns]

if pixel_columns:  # this part is from GazeDataFrame.__init__() and is false if the list is empty
   self._check_component_columns(pixel_columns=pixel_columns)
   self.nest(pixel_columns, output_column='pixel')
   column_specifiers.append(pixel_columns)

This is flexible enough for extending the column_candidates in a potential follow up.

Minimum acceptance criteria

prassepaul commented 2 months ago

fixed by #719