ai4er-cdt / WildfireDistribution

AI4EO GTC 2021/2. Private repository for group 1: determining wildfire distribution in visible remote sensing imagery.
MIT License
6 stars 0 forks source link

Sampler rejector for balanced input datasets #19

Closed Hamish-Cam closed 2 years ago

Hamish-Cam commented 2 years ago

Possible problem is that since most samples of input data will not be classified as 'burned', our model won't have enough samples of burns to pick out the important features correctly. Instead the model will probably tend to classify most samples as non-burned as this will give the lowest loss, without the ability to pick out burn features. (relative abundance of burn not important since not probabilistic model)

As explained in this paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9480226 (page 364), we will make a sampler rejector that makes sure that batches of samples must satisfy specific criteria. To start with these will be:

  1. Proportion of samples without any burn: 'no_burn_prop'
  2. Rest of samples will make up 1-'no_burn_prop' proportion and must meet the following criteria: a) Burn proportion > 'bp' (0 -> 1) b) Water proportion < 'wp' (0 -> 1)

To be compatible with PyTorch this must be implemented as a new sampler or edited dataloader class. Currently, I think a new sampler within torchGeo is the way to go.