Add ResNet based Unet as a comparision baseline for xView2

VMarsocci / pangaea-bench

Towards Robust Evaluation for Geospatial Foundation Models

GNU General Public License v3.0

106 stars 10 forks source link

So it turns out none of the existing foundation models work for xView2 building damage detection. Results are around 50% validation mIoU with DiffUnet.

The baseline to compare to would be an Imagenet pretrained ResNet, for example this guy right here

The current architecture where we split models into encoder-decoder is a bit weird with something like this, since in this case the decoder weights are an integral part of the model. The plan is to just cut off the final resnet layer of the model and hand it's features to the "decoder".

Sebastian H. has also implemented an importance cropping augmentation that is integral to the damage detection results. Sebastian G. has implemented a bunch of different tricks that worked for xView2 in a different project, but don't lead to any improvement with our geofm UNet baseline. So the current best guess is that pretraining or architecture are what's blocking our progress.

I started off with training UPerNets on top of the pretrained foundation models, but they're all terrible. Some always predict the background class at 20% mIoU, ScaleMAE performs the best with almost 50% mIoU, but the competition winner's solution gets ~80%. Our simplified solution, that we derived from the winning solution some time ago and has nothing to do with the current geofm project, achieves ~74%. So we'd like to see at the very least 65% in this repo.

Things I tried, which had worked in our other implementation, but not here, from the top of my head:

Class-weighted Dice loss
Combo loss: Focal + Dice, class-weighted
Sigmoid instead of softmax on all classes
Invert the background class, so it's positive when it shows the foreground, then use sigmoids and an adjusted weighted dice loss
Importance sampling (implemented by Sebastian H)
Using rotation as additional augmentation
Normalisation to [-1,1] instead of mean 0, std 1

DiffUNet performs better than ConcUNet, which is also not what I'd expect, given that ConcUNet should have more information to base its decision on.

VMarsocci / pangaea-bench

Add ResNet based Unet as a comparision baseline for xView2 #60