Depending on the type of the file we need to call either gaze.from_asc(), gaze.from_csv() or gaze.from_ipc().
For future programmatic pipeline definitions, ideally written in a more general filetype independent way, a single function would be much nicer to use instead.
Description of a solution
The difficulty lies in resolving the completely different signatures, but we could just refactor dataset.dataset_files.load_gaze_file() into gaze.from_file() and reuse the DatasetDefinition. This would be sufficient for the ongoing work on general pipelines and if desired this can be refined with additional arguments in a follow up work.
One additional difficulty lies in resolving the if preprocessed branch in dataset.dataset_files.load_gaze_file().
I would propose to use a more descriptive argument instead of preprocessed, for example auto_nest: bool, and put the additional nesting at the end of from_file() regardless of the filetype.
A good alternative could be to move the if preprocessed branch to load_gaze_files() instead of having it in load_gaze_file().
Maybe that's the cleanest way for now, as it doesn't break any behavior and lends us some time on how we handle the nesting issue in cached preprocessed csv files. In the long run the issue could be resolved further by adding auto column detection during initialization of a GazeDataFrame.
Minimum acceptance criteria
[ ] write gaze.from_file()
[ ] refactor dataset.dataset_files.load_gaze_files() to use gaze.from_file()
Description of the problem
Depending on the type of the file we need to call either
gaze.from_asc()
,gaze.from_csv()
orgaze.from_ipc()
. For future programmatic pipeline definitions, ideally written in a more general filetype independent way, a single function would be much nicer to use instead.Description of a solution
The difficulty lies in resolving the completely different signatures, but we could just refactor
dataset.dataset_files.load_gaze_file()
intogaze.from_file()
and reuse theDatasetDefinition
. This would be sufficient for the ongoing work on general pipelines and if desired this can be refined with additional arguments in a follow up work.One additional difficulty lies in resolving the
if preprocessed
branch indataset.dataset_files.load_gaze_file()
.I would propose to use a more descriptive argument instead of
preprocessed
, for exampleauto_nest: bool
, and put the additional nesting at the end offrom_file()
regardless of the filetype.Proposed signature:
But then I see that passing the column name attributes from the
DatasetDefinition
also wouldn't work.So basically we need to find a way to get rid of the
if preprocessed
branch of code: https://github.com/aeye-lab/pymovements/blob/cb9ef9571c5b24f7609928d18efc3ae2520c1d03/src/pymovements/dataset/dataset_files.py#L270-L330A good alternative could be to move the
if preprocessed
branch toload_gaze_files()
instead of having it inload_gaze_file()
. Maybe that's the cleanest way for now, as it doesn't break any behavior and lends us some time on how we handle the nesting issue in cached preprocessed csv files. In the long run the issue could be resolved further by adding auto column detection during initialization of aGazeDataFrame
.Minimum acceptance criteria
gaze.from_file()
dataset.dataset_files.load_gaze_files()
to usegaze.from_file()