MetOffice / XBTs_classification

Project for the classification of eXpendable Bathy Thermographs
BSD 3-Clause "New" or "Revised" License
4 stars 2 forks source link

Add metadata file for categories #21

Open stevehadd opened 4 years ago

stevehadd commented 4 years ago

Currently we have to load the whole dataset to be sure we have captured all the categories for caztegorical features in the dataset, because if we split the dataset by year, we are not guarentted to have all categories in all years. If we set up encoding based on the contents of a particular year, this can be problematic when comparing results across years. We want the encodings to be the same across the dataset. Currently this involves loading the whole dataset. Instead we could create a metafile which consolidates this info with a list of classes for each categorical feature, and some information for scaling for other features. This should make it easier to parallelise code as only this metadata file is needed at the start to set up processing pipelines in a library like dask.