Closed fkiraly closed 1 day ago
Thanks for the feedback @fkiraly!
So, instead of having a dataset.py script for each project, do I have one for all projects in the bg_control/data folder? I was initially planning on having the script with classes and functions available to use in 0_meal_identification/meal_identification/meal_identification/dataset.py (with a similar file for each project) , and the data was going to be stored in:
@andytubeee if you're interested.
@andytubeee @Phiruby @Tony911029 please check if there is anything that Franz mentioned here that might also be included in the data work as an enhancement.
Short review of data loaders, with the aim to ensure easy useability by other project members and maintainability.
Referring to current state of
1.01-cjr-change-point-index-creator.ipynb
, top cellMain comments:
py
files, possibly somewhere in thedata
folder. I would also organize the repository that all data concerns are separate, possibly also separating processed data from raw data. (more generally, raw data above a certain total size, perhaps 10 MB, should not be in GitHub repos unless dedicated data repos, as it clutters the repo)py
file, I would strongly recommend to refactor it. At the moment it is a monolithic end-to-end function with subroutines inside. I would suggest to have a design that separates: (a) switching between file and in-memory, (b) processing pipeline; the processing is also a loop over files, so I would make it "process a single file". More precisely, I would suggest to split things as follows:pytest
andpydantic
or similar.