bdzyubak / tensorflow-sandbox

A repository for studying applications of Deep Learning across fields, and demonstrating samples of my code and project managment
0 stars 0 forks source link

Make class to prepare data for training based on standard formats #10

Closed bdzyubak closed 2 years ago

bdzyubak commented 2 years ago

There are several standard formats for input data. For example, segmentation tasks often have a directory of images and a directory of masks with the same number of files. Classification tasks may have a directory of images (or other data), and a csv/text file with labels. For ease of use and maintainability, a class should be implemented capable of reading/preprocess/augmenting input datasets. This is much easier to maintain than copying training scripts and modifying for each problem, and allows more advanced preprocessing methods to be used widely.

bdzyubak commented 2 years ago

A prep_training_data class has been implemented which supports segmentation tasks. Data is expected to be in ,/data/images and ,/data/masks. Batch sizes, prefetching, rescaling are defined as defaults and do not need to be specified. Over time, more set methods will be implemented to toggle additional functions. 7e8c70bd0bfb52bc88ffcf33e66f06bd0a83f88f

bdzyubak commented 2 years ago

The data import can now be specified in as little as:

dataset = ImgMaskDataset(os.path.join(top_path,'data'))
dataset.prep_data_img_labels()

Set methods for prefetching batch size and image rescaling are available, and many more will be added later.

bdzyubak commented 2 years ago

The class now also supports classification type data with image data and labels placed like this: /data/images /data/excel_with_labels.csv

Extensions to other data formats and data preprocessing will be implemented in future enhancements. 6fd7f4f8b50ad9a5038a93119a84b60cecb2017b