ameraner / dsen2-cr

DSen2-CR: A network for removing clouds from Sentinel-2 images. This repo contains the model code, written in Python/Keras, as well as links to pre-trained checkpoints and the SEN12MS-CR dataset.
GNU General Public License v3.0
136 stars 29 forks source link

About data split #4

Closed lironui closed 2 years ago

lironui commented 2 years ago

Hello,

Could you please tell me the details about the 10 ROIs for validation and 10 ROIs for testing?

Regards.

P8H commented 2 years ago

I am interested on this too! I want to evalute properly the predicted images based on the provided pre-trained model. But for this I would need to know which images are used for training, validation and testing.

ameraner commented 2 years ago

Hi both, sorry for the late reply. I have updated the Readme with the according information. Feel free to reopen if you need more information.

P8H commented 2 years ago

Hi Andrea, thanks for your response. It looks like you have forgotten to make the link to the dataset split public accessible. I getting a message that I have no rights to access the file.

ameraner commented 2 years ago

Indeed, fixed now, thank you!

P8H commented 2 years ago

Thanks! I have the file list now but the filenames are not in the same format as the referenced dataset.

Datesetfilelist format: "ROIs1158_spring_101_0.tif" Dataset format: "ROIs1970_fall_93_p128.tif"

I tried to translate the filenames but without success. Could you give a hint, how the previous filenames can be converted such as it matches with the files in the referenced dataset? If this is even possible?

ameraner commented 2 years ago

Good question, maybe @PatrickTUM can help with this?

P8H commented 2 years ago

It looks like the split in datasetfilelist.csv was done by the "scene_id"? All parts of one scene should be in the same set.

Is it safe to assume that the "scene_id" in the datasetfilelist.csv is the same as in the referenced dataset? Therefor it would be possible to ignore the part_id.

My understading of the naming format: "ROIs1158spring_.tif"

PatrickTUM commented 2 years ago

Hi @lironui & @P8H,

the test split ROI of SEN12MS-CR are as follows:

'ROIs1158_spring_106', 'ROIs1158_spring_123', 'ROIs1158_spring_140', 'ROIs1158_spring_31', 'ROIs1158_spring_44', 'ROIs1868_summer_119', 'ROIs1868_summer_73', 'ROIs1970_fall_139', 'ROIs2017_winter_108', 'ROIs2017_winter_63'

You can find the patch-wise information of splits here. The validation split is just a suggestion, and may be adapted to your preferences. Note that the data set released is not identical to the one used in [1], and so the splits aren't exactly identical either. The collected ROI of SEN12MS-CR are a subset of those of [2], and to make it directly comparable to [2] (i.e. to establish pixel-wise correspondences), we applied an additional CRS transform in [3]. This transform, however, was not included in the earlier version of the data set. You can find additional informations wrt the changes in [3].

Hoping this helps! Cheers

[1] Meraner, A., Ebel, P., Zhu, X. X., & Schmitt, M. (2020). Cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion. ISPRS Journal of Photogrammetry and Remote Sensing, 166, 333-346. [2] Schmitt, Michael, et al. "SEN12MS--A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion." arXiv preprint arXiv:1906.07789 (2019). [3] Ebel, P., Meraner, A., Schmitt, M., & Zhu, X. X. (2020). Multisensor data fusion for cloud removal in global and all-season Sentinel-2 imagery. IEEE Transactions on Geoscience and Remote Sensing.

P8H commented 2 years ago

Indeed, that helps! Thanks a lot @PatrickTUM and @ameraner!