Dataset split. - Githubissues

NRCan / geo-deep-learning

Deep learning applied to georeferenced datasets

https://geo-deep-learning.readthedocs.io/en/latest/

MIT License

150 stars 49 forks source link

Dataset split. #511

Open Abdielfer opened 1 year ago

Abdielfer commented 1 year ago

I notice percent in the split dataset do not match the expected proportions. It seems split is made before filtering patches by min_annot_perc.

Abdielfer commented 1 year ago

More details: When I do split by percent, I expect the %val + %trn to match the total of the patch. Instead, I have fewer tails than I expected in the validation set, or training set, or both. It seems that the split is made from the total of patches, and then each split is filtered to the %of_annotation. This logic leads to the deletion of some tiles and the mismatch of expected number of tiles per split.