Question - Githubissues

Darxeal commented 7 months ago

Hello, I read the paper but I struggle to understand something. The dataset supposedly contains pairs of cloudy and cloudless images. And the cloudy images are not created synthetically. Does this mean the paired images are not temporally aligned?

If yes, doesn't this introduce a major flaw into models trained on this, since it might also learn temporal changes? I would expect the purpose of cloud removal is to estimate what is under the clouds "right now", not at a different time. For that we could just retrieve the most recent cloudless image. Am I missing something?

Ly403 commented 6 months ago

I have the same question as you.

PatrickTUM commented 6 months ago

Hi @Darxeal and @Ly403,

Great to see your interest in our work! Your questions are very valid and reflect what is a common and ongoing discussion in the cloud-removal research community.

As we utilize real satellite observations, involving subsequent passes to obtain paired cloudy and (most-recent) cloud-free samples, these observations do generally not coincide in time. This entails that temporal change may occur between paired observations but this should not be any problem for training neural networks or our overall experimental protocol, for the following reasons:

Mainly it's because the changes are random fluctuations, i.e. noise rather than a signal in the data. As our work is not focusing on one particular ROI or season and the sequential order of cloudy versus cloud-free occurrences is not fixed, this discourages networks to learn potential change and focus on the cloud removal task. One important feature of our dataset is its scale, i.e. the amount of samples. Even if single samples may display change due to the manner in which they are sampled, the within-sample variations in the overall dataset are not systematic and will thus average out. There may be sample-wise variations, but the mean discrepancy is (close to) zero. Any model that may be biased towards generating change on individual samples would thus be penalized for such extrapolations on other samples, and the optimal behavior would be to remain as faithful to the input as feasible.

We saw this being the case in our prior works utilizing losses encouraging the preservation of cloud-free pixels in the input [1, 2]. That is, even in case of sample-wise fluctuations (but: non-systematic changes on a dataset-wise level), having cloud removal models stay close to the original inputs is still the best strategy. For further evidence conveying the irrelevance of sample-wise variations, there's the figure below which I plotted a while ago. It illustrates the pixel-wise deviations (in terms of intensity) between co-registered pixels in the paired optical samples, for cloudy (green) versus cloud-free (blue) pixels. I have to apologize that the figure is not perfectly conveying my point because I cut off the y-axis back then, but the vast majority of co-registered cloud-free pixels across paired samples have practically zero difference. That is, even sample-wise variations across paired cloudy versus cloud-free samples are rare overall.

distribution

Finally, it's not the issue's main topic, but I briefly wish to draw attention to shortcomings of the alternative approach of using simulated cloudy data: In [2] we conducted an experiment to investigate how well models trained on synthetic data (trained on observations with pixel-wise correspondences between paired samples, except for those pixels covered by synthetic clouds) translate to real data, as an alternative to the paradigm of our dataset. The reported outcomes are that generalization from synthetic to real data was limited, with experiments using synthetic data generally over-estimating a model's goodness of performance. These results may be specific to the generation methods we consider and there may be more fancy approaches nowadays, albeit rarely used in practice. Overall both our usage of real data as well as the synthetic data approach have their caveats that should be treated with care, but at the end our impression is that it's more straightforward and optimal to take the approach we implemented with SEN12MS-CR and SEN12MS-CR-TS.

I hope these philosophical musings provide you with helpful insights and answers to your questions!

Cheers, Patrick

[1] Meraner, A., Ebel, P., Zhu, X. X., & Schmitt, M. (2020). Cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion. ISPRS Journal of Photogrammetry and Remote Sensing, 166, 333-346.

[2] Ebel, P., Meraner, A., Schmitt, M., & Zhu, X. X. (2020). Multisensor data fusion for cloud removal in global and all-season sentinel-2 imagery. IEEE Transactions on Geoscience and Remote Sensing, 59(7), 5866-5878.

PatrickTUM / SEN12MS-CR-TS

Question #13