SatelliteShorelines / CoastSeg

An interactive toolbox for downloading satellite imagery, applying image segmentation models, mapping shoreline positions and more. The mapping extension for CoastSat and Zoo.
https://satelliteshorelines.github.io/CoastSeg/
GNU General Public License v3.0
48 stars 10 forks source link

Feature request: merge shoreline segments into single merged shoreline #179

Closed dbuscombe-usgs closed 11 months ago

dbuscombe-usgs commented 1 year ago

Often, shorelines from different ROIs may have the same timestamp, i.e. a second shoreline extracted from an adjacent ROI image with the same timestamp.

So, we need a way to take the raw shoreline data and merged based on date. pandas/geopandas merge command seems to do this, e.g.

https://stackoverflow.com/questions/48231758/merge-pandas-dataframes-based-on-date

I think this might be best as a stand-alone script, at least initially while we test the workflow

2320sharon commented 1 year ago

I agree the fastest way to get this out would be a script. I can investigate how to do a few things

  1. How to merge the raw time-series shorelines for the csv files with all the transects
  2. How to merge the raw time-series shorelines for the csv files for each transect
  3. How to merge the geojson shorelines

    I think this issue is do-able as a V1.

2320sharon commented 1 year ago

Today I took some time to scope out this issue. Below I've outlined the prerequisites to merge ROIs, possible scenarios that could be encountered when combining roi's, as well as some questions I've come across while scoping out this issue. I've also included some demo code that introduces how we might merge the extracted shorelines together. I plan on working on this issue more during the week.

Prerequisites to Merge ROIs

  1. At least two ROIs must be provided
  2. Each Roi must have both config files present
  3. A raw time series CSV must be present

    Possible Scenarios

    Each of these scenarios outline possible inputs that could influence the way the roi's are merged.

    1. Roi's have different settings in their config files.
    2. ROIs have overlapping transects
    3. ROIs have overlapping shorelines for the same date time. This would be the result of overlapping roi's.
    4. ROIs are from different sessions
    5. ROIs have the same IDs
      • if this scenario occurs I believe the merge operation should be aborted

    Files to be Merged

    1. Config JSON files
    2. Config_gdf.geojson files
    3. Raw time series CSV files
    4. Corrected time series CSV files

Questions

  1. How would we merge discontinuous shorelines together?
  1. Would we want the shoreline detection images to be copied to this new merged session? @dbuscombe-usgs
  2. If the same transect is present in multiple rois and multiple shorelines intersect the same transect for the same date time what should happen to the intersecting Shoreline points?

    • one idea I had was to take the average of all the intersecting points excluding any non-nan points
    • the outputs would be saved to a new CSV that would include all of the transects present across all the rois
    • from this point new extracted Shoreline vectors could be derived.

    Basic Flow

    1. Load in all ROIs
    2. Scan the raw Time series CSV files to determine if there are any shoreline points that have intersected the same transect on the same date time.
    3. Once these points are identified from step 2, these points would be averaged or some other method, so that each transect only has one intersecting Shoreline point for each exact date time.
    4. Once all the intersections that took place on the same date times are merged (step 3) then a new set of extracted shoreline geojson files can be formed
    5. Then the new config files could be saved
    6. All the new merged data would be saved to a new directory.

Sample Code

The following sample code outlines how two extracted Shoreline Geo Json files can be merged together. Please note that we might not take this approach as we are still brainstorming possible solutions.

import geopandas as gpd
import pandas as pd
import os

# Load the GeoJSON files into GeoDataFrames
sessions_directory = r'C:\development\doodleverse\coastseg\CoastSeg\sessions'
session_directory = 'ID_lty16_lty13_extract_shorelines'
ROI_directory = 'ID_lty13_datetime10-10-23__09_39_43'
filename = 'extracted_shorelines_lines.geojson'
# load ROI file 1
gdf_file= os.path.join(sessions_directory,session_directory,ROI_directory,filename)
gdf1 = gpd.read_file(gdf_file)
# load ROI file 2
ROI_directory = 'ID_lty16_datetime10-10-23__09_39_43'
gdf_file= os.path.join(sessions_directory,session_directory,ROI_directory,filename)
gdf2 = gpd.read_file(gdf_file)

# Ensure that the date columns are in datetime format
gdf1['date'] = pd.to_datetime(gdf1['date'])
gdf2['date'] = pd.to_datetime(gdf2['date'])

# Merge the two GeoDataFrames on the date column
merged_gdf = gdf1.merge(gdf2, on='date', how='outer', suffixes=('_gdf1', '_gdf2'))

# Define a function to merge geometries
def merge_geometries(row):
    geom1 = row['geometry_gdf1'] if 'geometry_gdf1' in row and pd.notnull(row['geometry_gdf1']) else None
    geom2 = row['geometry_gdf2'] if 'geometry_gdf2' in row and pd.notnull(row['geometry_gdf2']) else None

    # Handle cases where one of the geometries might be missing
    if geom1 and geom2:
        return geom1.union(geom2)
    elif geom1:
        return geom1
    elif geom2:
        return geom2

# Apply the function to each row to merge the geometries
merged_gdf['merged_geometry'] = merged_gdf.apply(merge_geometries, axis=1)

# Keep only the date and merged_geometry columns
final_gdf = merged_gdf[['date', 'merged_geometry']]
final_gdf.columns = ['date', 'geometry']

# Convert to a GeoDataFrame
final_gdf = gpd.GeoDataFrame(merged_gdf[['date', 'merged_geometry']], geometry='merged_geometry')
final_gdf.columns = ['date', 'geometry']

# Ensure the geometry column
final_gdf.set_geometry('geometry', inplace=True)

# Convert the date column to string
final_gdf['date'] = final_gdf['date'].astype(str)

# Set a CRS if it doesn't already have one
if final_gdf.crs is None:
    final_gdf.set_crs(epsg=4326, inplace=True)  # WGS84 CRS

print(final_gdf)
final_gdf.to_file('finalized_outer_gdf.geojson',driver="GeoJSON")
2320sharon commented 1 year ago

Would we want the shoreline detection images to be copied to this new merged session? @dbuscombe-usgs

Nope.

2320sharon commented 1 year ago

@dbuscombe-usgs I've got a question about merging shorelines: what do we do if we have 2 overlapping shorelines for the same satellite and date? The images below are 2 shorelines I created where a portion of the shorelines overlap (its hard to see but the bottom red portion is overlapped by both shorelines, but in some of the area where these 2 shorelines exist they are not the same. As seen by the red spiked shoreline and the blue straight shoreline. The purple shoreline was created by merging the 2 shorelines with a shapely unary union operation, which just merged the 2 geometries together. I don't think we want this as it creates regions where 2 or more shorelines can exist for the same exact time.

We could average the the points across the shorelines. Basically if any of the shoreline points don't perfectly overlap for the same time date we create a new averaged shoreline out of the shorelines. Then use this new shoreline to intersect with the transects. If you have any ideas let me know.

Also I realized that the approach of averaging the intersection values for shoreline points that intersect with transects for the same satellite and date, won't work because the shoreline points are derived from the shoreline vectors. Therefore after merge the shoreline vectors would need to recompute the shoreline transect intersections for any new merged shorelines we created.

image image

2320sharon commented 1 year ago

So I created a little bit of sample code to demonstrate the idea of merging the 2 shorelines together via an average technique. The result is this brown shoreline. However this strategy is not perfect as the new averaged shoreline is much shorter than the blue shoreline due to the new shoreline being created out of midpoints of the 2 shorelines. We could modify this approach to ensure the final shoreline is the length of the 2 component shorelines.

image

import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import LineString

def resample_line(line, num_points):
    """Resample a LineString to have a specified number of evenly spaced points."""
    distances = np.linspace(0, line.length, num_points)
    return LineString([line.interpolate(distance) for distance in distances])

def average_shorelines(line1, line2, num_points):
    """Calculate the average shoreline between two LineStrings."""
    # Resample the input lines
    resampled_line1 = resample_line(line1, num_points)
    resampled_line2 = resample_line(line2, num_points)

    # Calculate the midpoints
    midpoints = [((coord1[0] + coord2[0]) / 2, (coord1[1] + coord2[1]) / 2)
                 for coord1, coord2 in zip(resampled_line1.coords, resampled_line2.coords)]

    return LineString(midpoints)

# read the shorelines from the file
import geopandas as gpd
gdf = gpd.read_file(r'C:\development\doodleverse\coastseg\CoastSeg\overlapping_shorelines.geojson', driver='GeoJSON')
# create a new interpolated shorelines using 20 points
new_linestring=average_shorelines(gdf.iloc[0][0],gdf.iloc[1][0],20)
# save the new averaged shoreline to a geojson file
new_gdf = gpd.GeoDataFrame(geometry=[new_linestring])
new_gdf.to_file('merged_averaged_overlapping_shorelines.geojson')

geojson_files.zip

2320sharon commented 1 year ago

So I did a bit more thinking on this idea. This issue will arise when 2 or more ROIs overlap, so we should be able to isolate the portions of the shorelines that have the potential to overlap, then average just the shorelines in that region.

Steps

  1. Clip shorelines to the regions of the ROI that overlap
  2. If more than 2 shorelines from 2 or more different ROI exist in this clipped region then continue to step 3
    • if not then just do a unary union of these shorelines
  3. Create a new average shoreline
  4. Interpolate this new shoreline to the shoreline vectors in the rest of the ROIs to create a new shoreline.

Example ROIs

image

Example Merged Shoreline

dbuscombe-usgs commented 1 year ago

Sorry for late response - yes, I like the last idea you had about averaging those overlapped shorelines, then appending that averaged segment to the rest of the shoreline from the two ROIs. Thanks for presenting these options!

2320sharon commented 1 year ago

All good, I'll give that approach a shot and see how it performs. Also I'm pretty sure the approach will struggle with back barrier shorelines, so expect some wonky results for back barrier shorelines or regions where there are discontinuous shorelines.

dbuscombe-usgs commented 1 year ago

Agree. This merge operation will be tricky for backbarrier shoreline workflows, which are tricky in general because inner shorelines tend to be far more complicated in 2d planform shape that outer coast ones. So much so that users should probably be encouraged to use coastseg only for the outer shorelines, where possible

2320sharon commented 1 year ago

I've been working on the script to merge extracted Shoreline sessions together. I want to leave a quick update while I keep working on it today. I've found that the method we use of averaging the points that make up each of the extracted shorelines Works moderately well when the shorelines are the same shape as you can see an example one. However if the shorelines are vastly different shapes as seen an example 2 the method falls apart, which means the user will need to filter out shorelines before they use this script.

I've found that not all the extracted shorelines are continuous single line strings. As you can see an example two which has multiline strings this complicates the averaging process as each line string in each multiline string has to be averaged together and depending on the orientation of each line string this can result in vastly different shapes.

To improve this method at some point we should probably take an account the distance each line string is from each other and find a way to average the line strings that are closest together first then the line strings that aren't. I think if we implement this it would improve the results.

Example 1

Code

import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import LineString, MultiLineString

def resample_line(line, num_points):
    """Resample a LineString to have a specified number of evenly spaced points."""
    distances = np.linspace(0, line.length, num_points)
    return LineString([line.interpolate(distance) for distance in distances])

# Redefining the average_lines function
def average_lines(line1, line2, num_points):
    """Calculate the average LineString between two LineStrings."""
    # Resample the input lines
    resampled_line1 = resample_line(line1, num_points)
    resampled_line2 = resample_line(line2, num_points)

    # Calculate the midpoints
    midpoints = [((coord1[0] + coord2[0]) / 2, (coord1[1] + coord2[1]) / 2)
                 for coord1, coord2 in zip(resampled_line1.coords, resampled_line2.coords)]

    return LineString(midpoints)

def average_shorelines(geom1, geom2, num_points):
    """Calculate the average shoreline between two geometries."""
    if isinstance(geom1, LineString) and isinstance(geom2, LineString):
        return average_lines(geom1, geom2, num_points)
    elif isinstance(geom1, MultiLineString) and isinstance(geom2, MultiLineString):
        averaged_lines = []
        for line1, line2 in zip(geom1.geoms, geom2.geoms):
            averaged_line = average_lines(line1, line2, num_points)
            averaged_lines.append(averaged_line)
        return MultiLineString(averaged_lines)
    elif isinstance(geom1, MultiLineString) and isinstance(geom2, LineString):
        averaged_lines = []
        for line in geom1.geoms:
            averaged_line = average_lines(line, geom2, num_points)
            averaged_lines.append(averaged_line)
        return MultiLineString(averaged_lines)
    elif isinstance(geom1, LineString) and isinstance(geom2, MultiLineString):
        averaged_lines = []
        for line in geom2.geoms:
            averaged_line = average_lines(geom1, line, num_points)
            averaged_lines.append(averaged_line)
        return MultiLineString(averaged_lines)
    else:
        raise TypeError("Input geometries must be LineString or MultiLineString")

# create a new interpolated shorelines using 54 points (original number of points in the extracted shoreline)
# note: merged_gdf is a merged geodataframe performed on the 'date' column, 'geometry_x' and 'geometry_y' are the geometries from the geodataframes that were merged
new_linestring=average_shorelines(merged_gdf['geometry_x'].iloc[0],merged_gdf['geometry_y'].iloc[0],54)
new_linestring

Plotting Code

# Re-importing required libraries
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import LineString, MultiLineString

# Select the geometries to be plotted
geom_x = gpd.GeoSeries(merged_gdf['geometry_x'].iloc[0])
geom_y = gpd.GeoSeries(merged_gdf['geometry_y'].iloc[0])
new_linestring_series = gpd.GeoSeries(new_linestring)

# Create a new plot
fig, ax = plt.subplots(figsize=(10, 10))

# Plot the geometries
geom_x.plot(ax=ax, color='blue', label='Geometry X')
geom_y.plot(ax=ax, color='green', label='Geometry Y')
new_linestring_series.plot(ax=ax, color='red', linestyle='--', linewidth=2, label='New Linestring')

# Add a legend
ax.legend()

# Set the aspect of the plot to be equal
ax.set_aspect('equal', adjustable='datalim')

# Save the plot to a PNG file
plt.savefig('averaged_shoreline.png', dpi=300)

plt.show()
2320sharon commented 1 year ago

I've made some headway on the merge sessions notebook for CoastSeg:

  1. A new module, merge_utils.py, now houses all the merge-related functions
  2. I've written many very detailed tests that are in test_merge_utils.py (I still need to write a few more)
  3. I've added capability to merge shorelines from the same ROI across different dates, this makes it easier to combine sessions for analyzed shorelines for specific date ranges

merge_sessions.zip

The purpose of the merge sessions notebook is to combine all the shorelines, ROIs, and transect data into a unified archive. This compiled data is not intended to be loaded into CoastSeg due to the complex dependencies and data references that would be compromised. To accommodate loading merged sessions into Coastseg, would necessitate extensive modifications to both CoastSeg and CoastSat. Such an overhaul would require in-depth consultation with users and a thorough review of all processes within both systems.

I think our resources are better spent getting the Planet API working and refining CoastSeg’s internal architecture. This approach will lay a stronger foundation, potentially simplifying the future incorporation of merged sessions.

2320sharon commented 11 months ago

test_case1_same_rois.zip test_case2_different_rois.zip test_case3_rois_overlapping_dates.zip test_case4_overlapping.zip

I made sample data to test merge_session.py with

2320sharon commented 11 months ago

merge_sessions.py Script Usage Instructions

Getting Started

To utilize the script in the development branch of CoastSeg, follow these instructions.

Step 1: Switch to the Development Branch

Begin by switching to the specified development branch:

git switch issue/179

Step 2: Environment Activation

Activate your coastseg environment to ensure proper dependencies are in place:

conda activate coastseg

Step 3: Install the CoastSeg Package

Install the CoastSeg package in editable mode. This will synchronize the CoastSeg version with the current branch's code.

pip install -e .

Note: The -e flag is essential for installing the package in a way that reflects changes made in the branch.

Running the Script

Preparing the Environment

Navigate to the scripts directory with the coastseg directory and open the merge_sessions.py script in Visual Studio Code:

cd scripts
code merge_sessions.py

Executing the Script

Example 1: Merging Two Sessions into a Single Merged Session

Consider merging two sessions with extracted shorelines into a single session. The following are the locations of 2 sessions from the test data provided.

   r"C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac6_datetime10-30-23__01_44_50"
    r"C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac1_datetime10-30-23__01_44_50"
  1. -i Specify the session locations separating each location with a space
  2. -s Specify that new session directory should be saved at "merged_sessions" directory within the script directory
  3. -n The merged session directory name will be "merged_session1"
  4. Don't specify any of the transect settings because default settings for shoreline transect intersections will be used.

To execute, separate each session location with a space (use double quotes on Windows):

python merge_sessions.py -i "C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac6_datetime10-30-23__01_44_50" "C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac1_datetime10-30-23__01_44_50" -s "merged_sessions" -n "merged_session1"

Example 2: Customizing Parameters

Merge the same sessions but modify the min points to 2 and the prc_multiple to 0.3:

python merge_sessions.py -i "C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac6_datetime10-30-23__01_44_50" "C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac1_datetime10-30-23__01_44_50" -s "merged_sessions" -n "merged_session1" -pm 0.3 -mp 2

Please follow these steps carefully to ensure successful execution of the script in the development branch of CoastSeg.