Model Development - Transformations Script

RobotPsychologist commented 3 weeks ago

@y-mx @aryavkin

The idea for this ticket is to implement a function that takes the data produced from the data generation pipeline:

0_meal_identification/meal_identification/meal_identification/datasets/dataset_cleaner.py
0_meal_identification/meal_identification/meal_identification/datasets/dataset_generator.py
0_meal_identification/meal_identification/meal_identification/datasets/dataset_operations.py

The above scripts are intended to facilitate the data generation and cleaning that occurs outside of the sktime library.

The transformations script will operate as the connection point between the data generated from above and the training pipeline. The training pipeline should be able to call the transformation script in a loop for extend training runs where we loop through a dictionary of sktime transformation pipelines:

Inside the transformation function itself should be a looping mechanism that loops through a list of provided datasets to apply the transformations to. E.g., we could have two identical data sets but one is the three hour meal window and one is the five hour meal window, but we want to apply the same transformations on both data sets for our experiments.

So one loop of the transformation script should:

Check if the transformed data set already exists in: 0_meal_identification/meal_identification/data/processed
If it does not exist load the specified data set from: 0_meal_identification/meal_identification/data/interim
Apply the transformation pipeline
Store the transformed data for caching if specified, for a given training run, we could create a new subdirectory with the training runs label, and a new directory for each transformation pipeline applied e.g. 0_meal_identification/meal_identification/data/processed/{training run label}/{pipeline label}

Once the looping is complete is should

Return a dictionary of the transformed data set(s) if specified
Record logs of the transformed data (perhaps create a external log recorder function, we don't need to write this right now just have the function set up to call an external logger function).

Please also right tests like the data team did using pydantic, you can reach out to them for guidance on this to conform to the standards they have been using for consistency. Reach out to @Tony911029 @andytubeee @Phiruby if you have questions regarding this.

@fkiraly Please let me know if this makes sense or if there are any other clarifications required.

aryavkin commented 2 weeks ago

interested

y-mx commented 2 weeks ago

add me please

RobotPsychologist commented 1 day ago

@Tony911029 and @andytubeee help with unit tests.

RobotPsychologist / bg_control

Model Development - Transformations Script #96