Closed lillythomas closed 5 months ago
We have 351 image labels pairs image, label pairs in s3://clay-benchmark/c2smsfloods/datacube/chips_512/
now. The path structure remains the same as in the above comment.
A lot of this has been addressed and tested with v0.2 Let's consider revisiting this with v1 on a separate issue when appropriate.
We're implementing the Cloud to Street - Microsoft flood dataset for our first benchmark dataset to be used for linear probing and evalutation of finetuning on a downstream task.
The dataset consists of 2/3 of our pretext model's datacube inputs (Sentinel-1 and Sentinel-2) along with raster water mask labels for both sensors. The images are 512x512xC pixels. Ideally, we could have used the images as is, but that wasn't the case since 1) the Sentinel-1 VV and VH images underwent RTC with a different DEM than what was used for the Sentinel-1 product via the planetary computer STAC catalog, and 2) the Sentinel-2 images were L1C top of atmosphere instead of L2A surface reflectance. Therefore, we created a redux of the original datapipeline (see PR #75) used to create the training data for the pretext model to generate datacubes for the benchmark dataset using the geospatial bounds, timestamp (from the granule name). The datacubes generated have all three inputs matching the exact specs of the pretext model's training data, at 512x512 pixels.
The dataset lives on S3 at
s3://clay-benchmark/c2smsfloods/
, and specifically this processed datacube dataset is withins3://clay-benchmark/c2smsfloods/datacube/chips_512/
. See the structure below for how the data is stored. Note (as of 12/8/2023, I hit a rate limit with my planetary computer API key (from a lot of use today) and was blocked from generating more than 43 datacubes. I'll have to try again tomorrow to see if I can get past this.Here are some example benchmark datacubes:
So, the first linear probing and finetuning task will be flood segmentation using this dataset. We'll implement a lightweight set of layers to achieve this and evaluate using standard metrics for segmentation (e.g. IoU, dice, F1).