Closed lillythomas closed 1 month ago
I've done a few more experiments to rigor-test things. The main changes involved shuffling the samples before partitioning using a seed for consistency across model variants and reproducibility. The prior metrics in the first comment did not involve shuffled data pre-partitioning. These experiments demonstrate how much the results vary based on different data distributions.
Experiment | Variant | Number of epochs | Mean IoU | Seed |
---|---|---|---|---|
1 | From scratch | 46 | 0.57 | 42 |
1 | Finetuned | 23 | 0.42 | 42 |
2 | From scratch | 36 | 0.44 | 33 |
2 | Finetuned | 8 | 0.31 | 33 |
3 | From scratch | 30 | 0.38 | 21 |
3 | Finetuned | 11 | 0.33 | 21 |
4 | From scratch | 19 | 0.48 | 9 |
4 | Finetuned | 26 | 0.38 | 9 |
In the above table, finetuned refers to finetuned with a multi-layer convolutional decoder.
An experiment in which the multi-layer convolutional decoder was replaced with the Clay decoder. | Experiment | Variant | Number of epochs | Mean IoU | Seed |
---|---|---|---|---|---|
5 | From scratch | 36 | 0.44 | 33 | |
5 | Finetuned (conv decoder) | 8 | 0.31 | 33 | |
5 | Finetuned (clay decoder) | 9 | 0.33 | 33 |
An experiment where the number of epochs were standardized instead of determined by an early stopping callback.
For a set number of 10 epochs, we achieved: | Experiment | Variant | Number of epochs | Mean IoU | Seed |
---|---|---|---|---|---|
6 | From scratch | 10 | 0.14 | 42 | |
6 | Finetuned (conv decoder) | 10 | 0.51 | 42 | |
6 | Finetuned (clay decoder) | 10 | 0.44 | 42 |
An experiment where the number of epochs were standardized instead of determined by an early stopping callback.
For a set number of 30 epochs, we achieved: | Experiment | Variant | Number of epochs | Mean IoU | Seed |
---|---|---|---|---|---|
7 | From scratch | 30 | 0.55 | 42 | |
7 | Finetuned (conv decoder) | 30 | 0.52 | 42 | |
7 | Finetuned (clay decoder) | 30 | 0.53 | 42 |
From scratch
Finetuned (conv and clay decoder)
36080/671e5636-0ed5-476d-8a91-49783aa6a988">
From scratch
Finetuned (conv decoder)
Finetuned (clay decoder)
Same part ii epoch-controlled experiment with two more seeds
Experiment | Variant | Number of epochs | Mean IoU | Seed |
---|---|---|---|---|
8 | From scratch | 30 | 0.41 | 21 |
8 | Finetuned (conv decoder) | 30 | 0.40 | 21 |
8 | Finetuned (clay decoder) | 30 | 0.44 | 21 |
From scratch
Finetuned (conv and clay decoders)
Experiment | Variant | Number of epochs | Mean IoU | Seed |
---|---|---|---|---|
9 | From scratch | 30 | 0.44 | 33 |
9 | Finetuned (conv decoder) | 30 | 0.42 | 33 |
9 | Finetuned (clay decoder) | 30 | 0.47 | 33 |
From scratch
Finetuned (conv and clay decoders)
| Name | Type | Params
| model | FloodDetector_fromscratch | 713 K
| dice_metric | Dice | 0
| 713 K Trainable params
| 0 Non-trainable params
| 713 K Total params
| 2.853 Total estimated model params size (MB)
| Name | Type | Params
| model | FloodDetector_clay_convdecoder | 98.3 M
| criterion | BCEWithLogitsLoss | 0
| 3.0 M Trainable params
| 95.3 M Non-trainable params
| 98.3 M Total params
| 393.032 Total estimated model params size (MB)
| Name | Type | Params
| model | FloodDetector_clay_decoder | 127 M
| criterion | BCEWithLogitsLoss | 0
| 32.4 M Trainable params
| 95.3 M Non-trainable params
| 127 M Total params
| 510.809 Total estimated model params size (MB)
Thanks for the detailed summary and thorough work @lillythomas 🚀 This sets the example for fine tuning which is great!
We can re-iterate over this later if needed, feel free to close this if you agree @lillythomas.
We've completed our first benchmarking exercise for Clay v0 using the Microsoft Flood detection dataset (see https://github.com/Clay-foundation/model/issues/83 for details).
The exercise entailed a comparison of a model "trained from scratch" vs. a model finetuned from Clay. They both are designed to segment input into binary maps of flood and background. The data module ingests datacubes like those used to train clay, with the addition of water masks and cloud/cloud shadow masks, applies cloud/cloud shadow masking and adds the water masks (labels) into the batch dictionary.
The from scratch model is a simple encoder / decoder architecture with He initialization, dropout regularization, learning rate scheduling (
lr=1e-3
to start) and early stopping. The loss function is binary cross entropy.The finetuned model uses Clay as the encoder and uses a simple multi layer convolutional decoder. It also implements learning rate scheduling, early stopping and minimizes a binary cross entropy loss function.
Results
The model from scratch converged (based on validation loss) in 23 epochs, whereas the model finetuned from Clay converged in 19 epochs.
Loss curves
From scratch
Train loss
Val loss
Finetuned
Train loss
Val loss
Evaluation metric
Out evaluation entailed calculating mean IoU on the validation dataset.
From scratch
mIoU =
0.27
Finetuned
mIoU =
0.31
❗Visual comparison
Blue band on left, water mask label in middle, prediction on right.
From scratch
Finetuned
Notebook: https://github.com/Clay-foundation/model/blob/benchmark_segmentation_task/notebooks/flood_benchmark_segmentation.ipynb