Open dkrako opened 2 months ago
maybe instead of defining the dataset type via datatype
, it would make more sense to define properties like has_gaze_files
and has_event_files
(and even has_stimulus_files
)? Maybe this could be more flexible in the future for further enhancements (think of has_eeg_files
).
following discussion point:
we already have events in the load definition:
def load(
self,
events: bool = False,
preprocessed: bool = False,
subset: dict[str, float | int | str | list[float | int | str]] | None = None,
events_dirname: str | None = None,
preprocessed_dirname: str | None = None,
extension: str = 'feather',
) -> Dataset:
should I change those to has_gaze_files, has_eventsfiles, has... ?
I think the signature of Dataset.load()
is a bit different. While DatasetDefinition.has_event_files
indicates if there are event files in the downloaded dataset to begin with, the events
argument in Dataset.load()
signifies if event data should be loaded (these can be events detected by pymovements and then saved with Dataset.save()
).
I wouldn't rename the load()
arguments (for now), as we don't need that breaking change. We can discuss if we want to rename/deprecate the argument in a followup PR.
I would propose that the events
argument can assume three values:
True
: try to load event filesFalse
don't load any event filesNone
or 'auto'
: if DatasetDefinition.has_event_files
is True
, load event files, if False
don't load any event files
Description of the problem
There are a lot of datasets published that only include event data and no raw gaze samples.
It would be great if the dataset library could also support these.
Description of a solution
The
DatasetDefinition
should probably extended with a new attribute that signifies the type of data to expect.My first proposal would be
datatype: Literal['raw', 'events']
. Let's brainstorm if we can find a better name.Additionally, the
Dataset.load()
method must be changed accordingly:https://github.com/aeye-lab/pymovements/blob/cb9ef9571c5b24f7609928d18efc3ae2520c1d03/src/pymovements/dataset/dataset.py#L77-L135
Currently, it always runs
Dataset.load_gaze_files()
, but this should then be dependent onDatasetDefinition.datatype
. Also, events should be set toevents: bool | None = None
and later assume a boolean value dependent onDatasetDefinition.datatype
I propose the following signature:
This would be backwards compatible:
Minimum acceptance criteria
DatasetDefinition
to indicate type of datasetDataset.load()
default values to decide loading gaze or events during runtime