MWieland / s1s2_water

S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images
17 stars 4 forks source link

S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images

This repository provides tools to work with the S1S2-Water dataset.

S1S2-Water dataset is a global reference dataset for training, validation and testing of convolutional neural networks for semantic segmentation of surface water bodies in publicly available Sentinel-1 and Sentinel-2 satellite images. The dataset consists of 65 triplets of Sentinel-1 and Sentinel-2 images with quality checked binary water mask. Samples are drawn globally on the basis of the Sentinel-2 tile-grid (100 x 100 km) under consideration of pre-dominant landcover and availability of water bodies. Each sample is complemented with STAC-compliant metadata and Digital Elevation Model (DEM) raster from the Copernicus DEM.

The following pre-print article describes the dataset:

Wieland, M., Fichtner, F., Martinis, S., Groth, S., Krullikowski, C., Plank, S., Motagh, M. (2023). S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, doi: 10.1109/JSTARS.2023.3333969.

Dataset version update (2024-05-24)

The dataset on Zenodo has been updated to a new version. Sentinel-1 scenes for samples #31 and #59 have been missing in v1.0.0 and are now included in v1.0.1 along with all relevant masks and metadata.

Dataset access

The dataset (~170 GB) is available for download at: (v1.0.1)

Download the dataset parts and extract them into a single data directory as follows.

└── data/
    ├── 1/
    │   ├── sentinel12_copdem30_1_elevation.tif
    │   ├── sentinel12_copdem30_1_slope.tif
    │   ├── sentinel12_s1_1_img.tif
    │   ├── sentinel12_s1_1_msk.tif
    │   ├── sentinel12_s1_1_valid.tif
    │   ├── sentinel12_s2_1_img.tif
    │   ├── sentinel12_s2_1_msk.tif
    │   ├── sentinel12_s2_1_valid.tif
    │   └── sentinel12_1_meta.json
    ├── 5/
    │   ├── sentinel12_copdem30_5_elevation.tif
    │   ├── sentinel12_copdem30_5_slope.tif
    │   ├── sentinel12_s1_5_img.tif
    │   ├── sentinel12_s1_5_msk.tif
    │   ├── sentinel12_s1_5_valid.tif
    │   ├── sentinel12_s2_5_img.tif
    │   ├── sentinel12_s2_5_msk.tif
    │   ├── sentinel12_s2_5_valid.tif
    │   └── sentinel12_5_meta.json
    ├── .../
    │   └── ...
    └── catalog.json

Dataset information

Each file follows the naming scheme sentinel12_SENSOR_ID_LAYER.tif (e.g. sentinel12_s1_5_img.tif). Raster layers are stored as Cloud Optimized GeoTIFF (COG) and are projected to Universal Transverse Mercator (UTM).

Sensor Layer Description Values Format Bands
S1 IMG Sentinel-1 image
GRD product
Unit: dB (scaled by factor 100) GeoTIFF
10980 x 10980 px
2 bands
0: VV
1: VH
S2 IMG Sentinel-2 image
L1C product
Unit: TOA reflectance (scaled by factor 10000) GeoTIFF
10980 x 10980 px
6 bands
0: Blue
1: Green
2: Red
3: NIR
4: SWIR1
5: SWIR2
S1 / S2 MSK Annotation mask
Hand-labelled water mask
0: No Water
1: Water
10980 x 10980 px
1 band
0: Water mask
S1 / S2 VALID Valid pixel mask
Hand-labelled valid pixel mask
0: Invalid (cloud, cloud-shadow, nodata)
1: Valid
10980 x 10980 px
1 band
0: Valid mask
COPDEM30 ELEVATION Copernicus DEM elevation Unit: Meters GeoTIFF
3660 x 3660 px
1 band
0: Elevation
COPDEM30 SLOPE Copernicus DEM slope Unit: Degrees GeoTIFF
3660 x 3660 px
1 band
0: Slope
N.a. META METADATA STAC metadata item JSON N.a.

Data preparation

Make sure to download the dataset as described above. Clone this repository, adjust settings.toml and run to prepare the dataset according to your desired settings.

The following splits images and masks for a specific sensor (Sentinel-1 or Sentinel-2) into training, validation and testing tiles with predefined shape and band combination. Slope information can be appended to the image band stack if required.

$ python --settings settings.toml

Data preparation parameters are defined in a settings TOML file (--settings)

SENSOR = "s2"                           # prepare Sentinel-1 or Sentinel-2 data ["s1", "s2"]
TILE_SHAPE = [256, 256]                 # desired tile shape in pixel
IMG_BANDS_IDX = [0, 1, 2, 3, 4, 5]      # desired image band combination
SLOPE = true                            # append slope band to image bands
EXCLUDE_NODATA = true                   # exclude tiles with nodata values
DATA_DIR = "/path/to/data_directory"    # data directory that holds the original images
OUT_DIR = "/path/to/output_directory"   # output directory that stores the prepared train, val and test tiles

# Sentinel-1 image bands
# {"VV": 0, "VH": 1}

# Sentinel-2 image bands
# {"Blue": 0, "Green": 1, "Red": 2, "NIR": 3, "SWIR1": 4, "SWIR2": 5}

Information on the deployed preprocessing steps for Sentinel-1 imagery can be found in the SNAP GPT file.


$ conda env create --file environment.yaml
$ conda activate s1s2_water