bdzyubak / tensorflow-sandbox

A repository for studying applications of Deep Learning across fields, and demonstrating samples of my code and project managment
0 stars 0 forks source link

Standardize data fetching/preprocessing across segmentation and classification tasks #15

Open bdzyubak opened 2 years ago

bdzyubak commented 2 years ago

To implement the data fetching module (Source/shared_utils//prep_training_data.py), the segmentation and classification children classes were put together based on open source code. As a result, there are differences in how data is fetched, images are preprocessed, and train/test data is split. The intent is to have these the same by default, and have separate methods where needed. Update the code accordingly.

bdzyubak commented 2 years ago

One valid different that will not be standardized is the train/test split. Segmentation uses a random split. Classification uses sklearn.model_selection.train_test_split which makes training and validation sets balanced with respect to labels. One could conceive a similar splitting for segmentation masks. Perhaps, the fraction of the image covered by the mask could be quantized and used to class-balance the sets.