Some potential ways to handle the extra data to improve model performance:
[ ] Loss functions that handle foreground/background classes properly
[ ] Focal Loss
[ ] Dice Loss
[ ] Self-supervised pre-training
[ ] Develop pretext tasks that make use of the extra data, generate useful embeddings on all the given data, and then fine-tune on images with burned areas only
The extra Sentinel-2 imagery dataset provided in https://huggingface.co/datasets/chabud-team/chabud-extra does not contain any burned areas according to https://huggingface.co/datasets/chabud-team/chabud-extra/discussions/1. If we include these datasets in the training, there will be a severe imbalance in the ratio of burned area to unburned area pixels.
Some potential ways to handle the extra data to improve model performance: