[NBI] Automatic sea ice segmentation in synthetic aperture radar images

louisavz commented 1 month ago

What is the notebook about?

This notebook introduces the use of unsupervised learning on synthetic aperture radar (SAR) data for sea ice classification. In particular, we are interested in ice floes which are any contiguous piece of sea ice. The size of the floes within a region can provide important climatic information, and we aim to quantify this characteristic. The target geographical area is Weddell Sea of West Antarctica. The example here will focus only on Weddell Sea, although this should be scalable to other regions of interest if given the appropriate training dataset.

Packages used in this notebook will include core python functionalities, as well as

This notebook will examine:

What is SAR data
Use of SNAP toolbox
Useful preprocessing steps for enhancing SAR image quality
Example ways to create additional channels over the HH and HV cross-polarisation
Walk through of the model
Apply example test set to a pretrained model
Examine the results when label data is either not available or poorly annotated (through visualisation)
Some thoughts on what the results imply

This notebook will not examine:

The preprocessing of SAR data using SNAP tool

Data Science Component

[x] Exploration
[ ] Preprocessing
[ ] Modelling
[ ] Post-processing
[ ] Other (e.g. Reproducibility):

Submission type

[x] Standard
[ ] Special Issue
[ ] Other (e.g. CI2023 Reproducibility Challenge):

Programming language

[x] Python
[ ] R
[ ] Julia
[ ] Other:

Checklist:

[x] Input data, pipeline and/or model are public with license/citation
[ ] The proposed notebook reuses existing codebase
[x] The proposed notebook uses open-source packages
[x] The proposed notebook is associated to existing publication(s)

Additional information

Paper in preparation regarding existing publication(s).

acocac commented 1 month ago

@louisavz thank for the submission 🙌

The notebook idea and outline sound great!

Can you provide further details of the target geographical areas? You mention you won't use the preprocessing in the SNAP tool. I wondered if you've considered open-source alternatives if they exist. Also, I suggest avoiding expensive training procedures in your notebook. EDS book uses open infrastructure incl. Binder so most existing notebooks load pre-trained models for inference.

louisavz commented 1 month ago

@acocac Thank you for the feedback and comments.

The target geographical area is Weddell Sea of West Antarctica. The example here will focus only on Weddell Sea, although this should be scalable to other regions of interest if given the appropriate training dataset.

SNAP is open-source and I'm using a script from BAS to do the preprocessing. That step is run on BAS HPC so I thought to skip that and provide example data that has already been corrected and calibrated. Perhaps I could provide a reference link to the tool kit and the script (if it is open-sourced)? I can include the steps used for preprocessing, and one can use the SNAP UI tool if they have a Windows machine.

You are absolutely right, I will provide the pre-trained models for inference rather than having to train the model here. Should I revise the above from

Applying models to training data

to

Walk through of the model
Apply example test set to a pretrained model

I have also adjusted models to model, as I will only focus on one unsupervised model to keep this short and concise. I hope this helps, please let me know if I can provide more details!

acocac commented 1 month ago

@acocac Thank you for the feedback and comments.

You're welcome. The purpose of the notebook idea stage is to provide feedback to consider in your first working version of the notebook.

The target geographical area is Weddell Sea of West Antarctica. The example here will focus only on Weddell Sea, although this should be scalable to other regions of interest if given the appropriate training dataset.

This is great - Please remember to mention this in the context section.

SNAP is open-source and I'm using a script from BAS to do the preprocessing. That step is run on BAS HPC so I thought to skip that and provide example data that has already been corrected and calibrated. Perhaps I could provide a reference link to the tool kit and the script (if it is open-sourced)? I can include the steps used for preprocessing, and one can use the SNAP UI tool if they have a Windows machine.

Sharing the key steps would be beneficial for the reader/user of your notebook. It's great that SNAP is open-source, I'm aware about other emerging programmatic, scalable alternatives such as eo_tools and xarray-sentinel using modern open-source Python tools like GDAL/Rasterio, Xarray, Dask, and GeoPandas. However, it seems SNAP is the most used library by EO researchers.

You are absolutely right, I will provide the pre-trained models for inference rather than having to train the model here. Should I revise the above from

Applying models to training data

to

Walk through of the model

Apply example test set to a pretrained model

Thanks for adding the step. Pre-trained models usually work well. You can indeed register them in scivision if relevant.

I have also adjusted models to model, as I will only focus on one unsupervised model to keep this short and concise. I hope this helps, please let me know if I can provide more details!

This looks good to me. Looking forward to hearing about the outcome of your publication.

louisavz commented 1 month ago

@acocac Thank you for the feedback :). I have updated the description with your suggestions.

We had a co-working session at BAS exploring the use of Dask for tiling of large images recently. I would love to utilise the modern open-source Python tools you have mentioned above more consistently.

alan-turing-institute / environmental-ds-book