Closed dbuscombe-usgs closed 1 year ago
This will require changes to config file, and make_datasets script.
The wider issue is that users need to be told that a test (hold out, independent) dataset should be created for true test of model skill. Validation metrics are only useful to a point, but do not reveal how well a model performs out of distribution (different time/space/season/weather/etc than is represented in the training/validation dataset)
Eventually we could have a separate 'evaluation' dataset, where users specifically evaluate model performance against a test set of images and associated labels. Metrics would be generated that reveal skill of that test data. Test data could/should be added to over time as the ML project matures
In the meantime, README and wiki should be updated to state
@ebgoldstein discussed this again earlier this week
make_datasets
would need to be updated to:
In train_model
, we would:
VALIDATION_SPLIT would be removed, as would the filename shuffle. Filenames for each split would instead be read directly from the respective folders
MODE would go - there would be no need for it
Any other thoughts, @ebgoldstein?
As presented above, it would be a moderate amount of work -- probably only 1--2 days
I could work on this next @ebgoldstein . My proposed changes would remove some config parameters .... perhaps the augmentation params could get put in a different config? happy to discuss
to me, the workflow would be:
Make_dataset:
-takes list of files
-splits into train/val lists (this code would move from train
to makedataset
-in the npz4gym folder, creates folder for train data and folder for val data
-non aug train images are put into train data folder
-non aug validation images are put into val data folder
-train images are augmented depending on existing aug configs, and put into train data folder
then in train_model: -train_ds is made from train folder -val_ds is made from val folder
I think this would cause no additional configs, remove the data leak between train and val, and make sure to always validate on non-aug imagery.
oh, i see now you outlined something similar above :facepalm: .. sorry... reading it i think we are on the same page..
I don't care too much about having a test set in gym.. i tend to do that later, but i am fine if its incorporated too..
Yeah I agree including a test set may be problematic for users with small datasets. Plus, they would likely be all drawn from the same distribution of imagery as the train and val sets, so wouldn't be a good test for out-of-distribution application
I can make a new branch and implement this idea this week
I started a new branch and have started work on implementing this idea. More soon ...
Done. See https://github.com/Doodleverse/segmentation_gym#new-in-may-2023
commit: https://github.com/Doodleverse/segmentation_gym/commit/d11a3f63531cd9baf1575a9732dd8210781ae316
changes to: https://github.com/Doodleverse/segmentation_gym/blob/main/make_dataset.py https://github.com/Doodleverse/doodleverse_utils/blob/main/doodleverse_utils/make_mndwi_dataset.py https://github.com/Doodleverse/doodleverse_utils/blob/main/doodleverse_utils/make_ndwi_dataset.py
see also https://pypi.org/project/doodleverse-utils/0.0.30/
tested on several datasets
Create a way to add more training / validation options, including