RobotPsychologist / bg_control

Improving short-term prandial blood glucose outcomes for people with type 1 diabetes, a complex disease that affects nearly 10 million people worldwide. We aim to leverage semi-supervised learning to identify unlabelled meals in time-series blood glucose data, develop meal-scoring functions, and explore causal machine-learning techniques.
https://blood-glucose-control.streamlit.app/
18 stars 42 forks source link

Model Development - Transformations Script #96

Open RobotPsychologist opened 3 weeks ago

RobotPsychologist commented 3 weeks ago

@y-mx @aryavkin

The idea for this ticket is to implement a function that takes the data produced from the data generation pipeline:

The above scripts are intended to facilitate the data generation and cleaning that occurs outside of the sktime library.

The transformations script will operate as the connection point between the data generated from above and the training pipeline. The training pipeline should be able to call the transformation script in a loop for extend training runs where we loop through a dictionary of sktime transformation pipelines:

Inside the transformation function itself should be a looping mechanism that loops through a list of provided datasets to apply the transformations to. E.g., we could have two identical data sets but one is the three hour meal window and one is the five hour meal window, but we want to apply the same transformations on both data sets for our experiments.

So one loop of the transformation script should:

  1. Check if the transformed data set already exists in: 0_meal_identification/meal_identification/data/processed
  2. If it does not exist load the specified data set from: 0_meal_identification/meal_identification/data/interim
  3. Apply the transformation pipeline
  4. Store the transformed data for caching if specified, for a given training run, we could create a new subdirectory with the training runs label, and a new directory for each transformation pipeline applied e.g. 0_meal_identification/meal_identification/data/processed/{training run label}/{pipeline label}

Once the looping is complete is should

Please also right tests like the data team did using pydantic, you can reach out to them for guidance on this to conform to the standards they have been using for consistency. Reach out to @Tony911029 @andytubeee @Phiruby if you have questions regarding this.

@fkiraly Please let me know if this makes sense or if there are any other clarifications required.

aryavkin commented 2 weeks ago

interested

y-mx commented 2 weeks ago

add me please

RobotPsychologist commented 1 day ago

@Tony911029 and @andytubeee help with unit tests.