Closed jnyborg closed 4 years ago
Hi jnyborg, Thanks a lot! You are making a very good point, ConvLSTM for instance have shown the ability to detect and filter out cloudy pixels and there's no reason why the attention-based approach couldn't. It's rather a case of old habits die hard: we were using this dataset to develop earlier methods that were more sensitive to clouds, hence the cloud filtering. But there is not specific reason to do it anymore, and we will remove this preprocessing step in our future datasets.
Regarding the interpolation method, yes it requires a cloud mask and cloudy pixels are replaced by the linear interpolation of the previous and next available clear pixel.
This paper is actually more relevant. The same cloud filtering ability is shown for the Transformer.
Thanks for the help! I was also suspecting that it might be due to other methods. But skipping this step should make it a lot easier to preprocess my dataset.
Also, I was wondering: when you evaluate your model, do you still randomly sample pixels from test parcels? Or did you find it beneficial to include all pixels at inference time?
You're welcome! Yes we keep the random sampling at inference time to keep things simple. I think we tried averaging the prediction over several different samplings but with no noteworthy improvement. The upside of this is that you could only keep S pixels from each parcel once your model is trained, thus reducing even more you dataset size for inference.
That's an interesting idea, a bit like test-time augmentation but for pixel-sets---too bad it didn't yield noteworthy improvements. Thanks again for the quick response, I should be set for now to test out your approach on my own dataset!
No worries, I'll be curious to hear how it works out. Cheers!
Can confirm it works just fine with cloudy pixels. I've been running with L1C data with cloud coverage <90%, and the lightweight version of your model recovers almost the full performance of a ResNet16+GRU model (94.8 vs 96.1 in test F1 score). Tweaking your training a bit by adding weight decay and a cosine annealing learning rate schedule (which my model also benefits from) further increases F1 to 95.8, so I think your model can very well surpass my CNN based model as you show in your paper with a bit more tuning---my CNN model also benefits from a lot of augmentation.
This is super cool! This is a reduction in model parameters from 25 million to 160.000 in my case, and also reducing data from 58GB compressed to 20GB uncompressed, making it so much nicer to work with. And the fact that we can throw away 99% of parameters with no performance drop is... well, kind of interesting. I very much look forward to experimenting further with your approach!
Thanks for the feedback! And I am glad to hear that the improvements we saw both in performance and in technical constraints also apply for your setup and dataset, that's good news.
Hello! Really interesting paper, and a very cool approach. I was wondering about the paragraph in Section 4.1: "The values of cloudy pixels are linearly interpolated from the first previous and next available pixel using Orfeo Toolbox", which I assume means that cloud masks are required for the satellite images and that any cloudy pixels are replaced with the nearest (in time) land surface pixel.
Is there any specific reason to why you do not include cloudy pixel values? Can the network not handle them? In Section 2, under "Attention-Based Approach", you mention that "Transformer yields classification performance that is on par with RNN-based models and present the same robustness to cloud-obstructed observations", so it would seem that it should not be a problem to include them---or what?
Thanks!