Berkeley-Data / hpt

MIT License
2 stars 3 forks source link

prepare subset SEN12MS and EDA for quicker training #7

Closed taeil closed 3 years ago

taeil commented 3 years ago

Check this dataset out. 180k triplets, georeferenced, multi band, multi modal, multi resolution: "SEN12MS -- A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion" https://arxiv.org/abs/1906.07789

download instructions

taeil commented 3 years ago
taeil commented 3 years ago

not sure if there is multiple images for one location over different time.

suryagutta commented 3 years ago

510 GB, 542.000 files Dataset locations:

taeil commented 3 years ago

according to the paper, it is not very clear..

The Sentinel-1 data can be recognized by the abbreviation s1, the Sentinel-2 data by s2, and the MODIS land cover data by lc; the individual patches can be identified by the token pXXX where XXX denotes a unique identifier number per patch. Thus, the file naming convention follows the following scheme: ROIsSSSS SEASON DD pXXX.tif, where SSSS denotes the seed value, SEASON denotes the meteorological season as defined for the northern hemisphere, DD denotes the data identifier, and XXX denotes the patch identifier. Of course, we are aware that the seasonal structuring of the dataset is only of little semantic worth since we have taken the seasons of the northern hemisphere as a reference. To allow end- users a sub-structuring of the dataset taking semantically mean- ingful seasons into account, we provide the file seasons.csv with the metadata of the dataset. It declares, which scenes ac- tually were acquired in spring, summer, winter, and fall from a climatic point of view.

taeil commented 3 years ago

added notebook with basic visualization.

taeil commented 3 years ago

generated samples are located at /scratch/crguest/data/sen12ms_small here is a notebook too. .