aliFrancis / SEnSeIv2

Sensor Independent Cloud and Shadow Masking with Ambiguous Labels and Multimodal Inputs
GNU General Public License v3.0
34 stars 0 forks source link

Sensei v2 with Sentinel-2 SR and and Sentinel-1 data after corrections #5

Open Geethen opened 5 months ago

Geethen commented 5 months ago

What would you suggest as the best approach to using SenseiV2 with surface reflectance data instead of TOA data and S1 after going through speckle filtering and border noise removal, etc.

Would it better to fine-tune or train senseiv2 from scratch?

In either case, are there any examples to assist me with this? If not, I would be interested in putting together examples if I could get guidance/assistance

aliFrancis commented 5 months ago

Interesting question!

I chose not to use bottom-of-atmosphere reflectance values, mainly because for Sentinel-2 and Landsat data, the atmospheric correction algorithms used will have already attempted some kind of masking and correction of clouds (thin clouds and haze are actually removed through a correction). So, a thin cloud in L1C that is annotated as such may go "missing" in L2A and so the model would have a harder time knowing when to trust the annotations during training. All that being said, it's probably still very possible to train a model on L2A data for cloud masking, but I doubt the existing L1C models would work on L2A without retraining.

Maybe easier than a full finetuning/retraining on L2A would be to simply retrieve the L1C data from which the L2A came from, and use that for the cloud masking step. Is that possible for you? Or are you using data that does not have a paired level 1 product available?

As for S1, I just used the data as provided in CloudSEN12, which was collected via Google Earth Engine, and so it uses the default RTC processing from there. I doubt the model will be able to handle S1 data with any other kinds of preprocessing. SAR definitely needs more thought in SEnSeI, I think some parameterisation of the preprocessing style, and data augmentation during training, is needed to make the model robust to different kinds of speckle filter etc. This also reminds me, I need to add more detail about how to ingest SAR into SEnSeI, currently the code does not have the built-in descriptors needed and so it would be difficult for someone else to replicate it. I will add them in the coming days so that you can at least try easily with the GEE-style Sentinel-1!

Geethen commented 5 months ago

Thanks for your response.

your decision to use L1C now makes sense. Given the discrepancies you outlined between the inputs sensei was used trained on and the inputs I intend on using (either L8, L9 or S2 data with S1 data), I think it would be better to train from scratch. I am interested in burn area mapping and I am most interested in using the sensei v2 encoder.

If you do have time to create examples for using senseiv2 with any of the models from smp those will make it easier for others to pick up. The Clay team seem to be heading down that direction and it will be great for wider adoption in the EO community.

On Wed, Jul 3, 2024 at 10:19 AM aliFrancis @.***> wrote:

Interesting question!

I chose not to use bottom-of-atmosphere reflectance values, mainly because for Sentinel-2 and Landsat data, the atmospheric correction algorithms used will have already attempted some kind of masking and correction of clouds (thin clouds and haze are actually removed through a correction). So, a thin cloud in L1C that is annotated as such may go "missing" in L2A and so the model would have a harder time knowing when to trust the annotations during training. All that being said, it's probably still very possible to train a model on L2A data for cloud masking, but I doubt the existing L1C models would work on L2A without retraining.

Maybe easier than a full finetuning/retraining on L2A would be to simply retrieve the L1C data from which the L2A came from, and use that for the cloud masking step. Is that possible for you? Or are you using data that does not have a paired level 1 product available?

As for S1, I just used the data as provided in CloudSEN12, which was collected via Google Earth Engine, and so it uses the default RTC processing from there. I doubt the model will be able to handle S1 data with any other kinds of preprocessing. SAR definitely needs more thought in SEnSeI, I think some parameterisation of the preprocessing style, and data augmentation during training, is needed to make the model robust to different kinds of speckle filter etc. This also reminds me, I need to add more detail about how to ingest SAR into SEnSeI, currently the code does not have the built-in descriptors needed and so it would be difficult for someone else to replicate it. I will add them in the coming days so that you can at least try easily with the GEE-style Sentinel-1!

— Reply to this email directly, view it on GitHub https://github.com/aliFrancis/SEnSeIv2/issues/5#issuecomment-2205389935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBW5I45QSERAKLBGPH2LODZKOXZJAVCNFSM6AAAAABKHSEDFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBVGM4DSOJTGU . You are receiving this because you authored the thread.Message ID: @.***>