edornd / mmflood

Flood delineation from Sentinel-1 SAR imagery
MIT License
22 stars 6 forks source link

not a bug, just some thoughts #8

Open makovez opened 10 months ago

makovez commented 10 months ago

First of all, really thanks for your amazing work, it is not common for this kind of projects to have them open source on github, especially with the models architecture included too. I am now going slowly in details through the paper and the code, and I must say that they are both commented very well, well written and structured.

I am currently starting my disseration project and I would like to use this as a starting point. My idea involves incorporating two images into the model: a pre-flood image and a post-flood image (with some adjustments to the model architecture). Im curious if there's a specific reason why the decision was made to use single images exclusively, or if the primary focus was on water detection.

Second thought is about the use of EMS. While I'm aware that EMS is a well-established Copernicus service, I've yet to come across a clear explanation regarding the dataset used to train the models that generate the ground truth labels. These labels aren't human-generated but rather produced by algorithms. From the limited information available, it appears that these algorithms don't involve machine learning and primarily rely on Sentinel-1 or other SAR data. Additionally, there's a lack of information regarding the accuracy of these algorithms.

This leads to the question: Is it worth using EMS as ground truth? Firstly, the model can't surpass the accuracy of EMS ground truth. And if the EMS labels are flawed, we might be instructing the model with erroneous information. In such a case, what we pretend to get out from the model?

edornd commented 10 months ago

Hi @makovez, your points are perfectly reasonable, the focus here was quite what you guessed: our objective was mainly to understand how feasible it is to delineate flooded areas using the single post-event image only.

About EMS you're perfectly right, there's not much in terms of "how good they are", and they are generated using tools, however this generation is still done by a human operator (GIS experts, if you check the bottom right of each EMS map there's the provider's name), so in this regard it is still manually validated and in my opinion a decent output for ML (probably not the best, but we also took the opportunity to explore this data).

In my opinion, so completely debatable, it is still worth using them, especially if you include pre+post and you probabbly also account for inaccuracies (e.g., some ad hoc loss, online relabeling, self-training, ...).

Last, I must say that this work here it's pretty much obsolete at the moment: we are working on a way more accurate flood detection pipeline using time series. We also intend to release the new dataset, addressing the limitations of this work.

makovez commented 10 months ago

That's good that EMS provides also the person name that manually review the labels, I didn't know that.

Do you intend to use LSTM + CNN for timeseries? And if you will still use EMS as labels, how do you create a GT for the timeseries?

It would be nice to have a chat with you, if you have telegram @sbongown

edornd commented 10 months ago

Can't really say much at the moment, but in terms of models we'll try new SotAs (vision transformers mostly), and in terms of label basically the input is a "4D" tensor with S1 VV-VH images in time, with a single output for each series, the rasterized EMS delineation.