SatelliteShorelines / CoastSeg

An interactive toolbox for downloading satellite imagery, applying image segmentation models, mapping shoreline positions and more. The mapping extension for CoastSat and Zoo.
https://satelliteshorelines.github.io/CoastSeg/
GNU General Public License v3.0
46 stars 9 forks source link

New Feature: Outlier Detection for Extracted Shorelines #130

Open 2320sharon opened 1 year ago

2320sharon commented 1 year ago

The Problem

When shorelines are extracted from segmented imagery, sometimes the extracted shorelines are far from correct. Bad extracted shorelines are often due to bad segmentations, cloud cover, no data sections in the image, poor image resolution, or bad lighting in the image.

Solution

We need a way to detect bad shorelines and isolate them from the rest of the good shorelines. Isolating bad shorelines could be done by deleting them, placing them into a new directory, or another way.

Generalized Potential Solutions

  1. Use the statistical techniques to use the shoreline points to determine where the average shoreline lies in the image. Any shorelines that due to lie within this average "zone" will be considered bad shorelines.

Resources

Solution Used by CoastSat

Here Killian uses reject outliers to remove shorelines with dramatic changes between consecutive timestamps. It does use otsu thresholding with the MNDWI_threshold calculated from coastsat's version of extract shorelines which we do not use for extracting shorelines with our models. I don't think we will be able to use this function without serious modifications. That being said it can be used as an inspiration for further post processing on the extracted shorelines.

https://github.com/kvos/CoastSat/blob/master/example_jupyter.ipynb

"" 5.1 Despiking the time-series The tidally-corrected time-series of shoreline change obtained with the steps above may still contain some outliers (from cloud shadows, false detections etc). The function SDS_transects.reject_outliers() was developed to remove obvious outliers in the time-series, by removing the points that do not make physical sense in a shoreline change setting.

For example, the shoreline can experience rapid erosion after a large storm, but it will then take time to recover and return to its previous state. Therefore, if the shoreline erodes/accretes suddenly of a significant amount (max_cross_change) and then immediately returns to its previous state, this spike does not make any physical sense and can be considered an outlier. Additionally, this funciton also checks that the Otsu thresholds used to map the shoreline are within the typical range defined by otsu_threshold, with values outside this range identified as outliers. ""

# remove outliers in the time-series (coastal despiking)
settings_outliers = {'max_cross_change':   40,             # maximum cross-shore change observable between consecutive timesteps
                     'otsu_threshold':     [-.5,0],        # min and max intensity threshold use for contouring the shoreline
                     'plot_fig':           True,           # whether to plot the intermediate steps
                    }
cross_distance = SDS_transects.reject_outliers(cross_distance,output,settings_outliers)
dbuscombe-usgs commented 1 year ago

Thanks for this summary. We should discuss options with the TCA group, too. There are a few aspects to this I'd like to break down

  1. flagging outlier shorelines (whole shorelines) : a) assume that most shorelines are good (large N problem?), then b) flag outliers based on deviation from central tendency. This could be done by finding shorelines that are outside the spatial extent of a typical shoreline, or ones that deviate significantly from known good shorelines within a short interval (i think this might be similar to Kilian's approach?). Both are 'physical' in the sense they use some physical logic. like putting limits on expected rates of change

  2. flagging outlier transects (partial shorelines): a) assume most of the extracted shoreline is good, with some localized spots with error, then b) flag outliers based on central tendency. This is potentially much more difficult; central tendency is harder to define here.

  3. Once we have flagged the erroneous data, do we attempt to salvage it, or simply ignore it? I think there may be room for both, depending on how large the gaps are

I'll be putting some ideas together and posting here