When working with local datasets we have to setup a bunch of different column names in the DatasetDefinition.
It would be much nicer if standard column namings could be inferred.
Description of a solution
In some cases column names could be guessed. As a starting point we could reuse our internal standard for preprocessed files:
The default could also be set to True but that could be a breaking change so I'm ambivalent.
For each attribute, for example pixel, we would then write something like this:
component_suffixes = ['x', 'y', 'xl', 'yl', 'xr', 'yr', 'xa', 'ya']
if auto_column_detect and pixel_columns is None: # I would vote for not overwriting specified columns
column_canditates = ['pixel_' + suffix for suffix in component_suffixes]
pixel_columns = [c for c in column_canditates if c in gaze_df.frame.columns]
if pixel_columns: # this part is from GazeDataFrame.__init__() and is false if the list is empty
self._check_component_columns(pixel_columns=pixel_columns)
self.nest(pixel_columns, output_column='pixel')
column_specifiers.append(pixel_columns)
This is flexible enough for extending the column_candidates in a potential follow up.
Minimum acceptance criteria
[x] auto detect columns if adhering to the internal column naming standard for preprocessed csv files
Description of the problem
When working with local datasets we have to setup a bunch of different column names in the
DatasetDefinition
. It would be much nicer if standard column namings could be inferred.Description of a solution
In some cases column names could be guessed. As a starting point we could reuse our internal standard for preprocessed files:
https://github.com/aeye-lab/pymovements/blob/cb9ef9571c5b24f7609928d18efc3ae2520c1d03/src/pymovements/dataset/dataset_files.py#L283-L296
This would also very much simplify #714 as there's no need for an
auto_nest
argument then.So I would propose to add a new argument to the init, for instance:
The default could also be set to
True
but that could be a breaking change so I'm ambivalent.For each attribute, for example
pixel
, we would then write something like this:This is flexible enough for extending the
column_candidates
in a potential follow up.Minimum acceptance criteria