Closed dbuscombe-usgs closed 11 months ago
I agree the fastest way to get this out would be a script. I can investigate how to do a few things
How to merge the geojson shorelines
I think this issue is do-able as a V1.
Today I took some time to scope out this issue. Below I've outlined the prerequisites to merge ROIs, possible scenarios that could be encountered when combining roi's, as well as some questions I've come across while scoping out this issue. I've also included some demo code that introduces how we might merge the extracted shorelines together. I plan on working on this issue more during the week.
A raw time series CSV must be present
Each of these scenarios outline possible inputs that could influence the way the roi's are merged.
If the same transect is present in multiple rois and multiple shorelines intersect the same transect for the same date time what should happen to the intersecting Shoreline points?
The following sample code outlines how two extracted Shoreline Geo Json files can be merged together. Please note that we might not take this approach as we are still brainstorming possible solutions.
import geopandas as gpd
import pandas as pd
import os
# Load the GeoJSON files into GeoDataFrames
sessions_directory = r'C:\development\doodleverse\coastseg\CoastSeg\sessions'
session_directory = 'ID_lty16_lty13_extract_shorelines'
ROI_directory = 'ID_lty13_datetime10-10-23__09_39_43'
filename = 'extracted_shorelines_lines.geojson'
# load ROI file 1
gdf_file= os.path.join(sessions_directory,session_directory,ROI_directory,filename)
gdf1 = gpd.read_file(gdf_file)
# load ROI file 2
ROI_directory = 'ID_lty16_datetime10-10-23__09_39_43'
gdf_file= os.path.join(sessions_directory,session_directory,ROI_directory,filename)
gdf2 = gpd.read_file(gdf_file)
# Ensure that the date columns are in datetime format
gdf1['date'] = pd.to_datetime(gdf1['date'])
gdf2['date'] = pd.to_datetime(gdf2['date'])
# Merge the two GeoDataFrames on the date column
merged_gdf = gdf1.merge(gdf2, on='date', how='outer', suffixes=('_gdf1', '_gdf2'))
# Define a function to merge geometries
def merge_geometries(row):
geom1 = row['geometry_gdf1'] if 'geometry_gdf1' in row and pd.notnull(row['geometry_gdf1']) else None
geom2 = row['geometry_gdf2'] if 'geometry_gdf2' in row and pd.notnull(row['geometry_gdf2']) else None
# Handle cases where one of the geometries might be missing
if geom1 and geom2:
return geom1.union(geom2)
elif geom1:
return geom1
elif geom2:
return geom2
# Apply the function to each row to merge the geometries
merged_gdf['merged_geometry'] = merged_gdf.apply(merge_geometries, axis=1)
# Keep only the date and merged_geometry columns
final_gdf = merged_gdf[['date', 'merged_geometry']]
final_gdf.columns = ['date', 'geometry']
# Convert to a GeoDataFrame
final_gdf = gpd.GeoDataFrame(merged_gdf[['date', 'merged_geometry']], geometry='merged_geometry')
final_gdf.columns = ['date', 'geometry']
# Ensure the geometry column
final_gdf.set_geometry('geometry', inplace=True)
# Convert the date column to string
final_gdf['date'] = final_gdf['date'].astype(str)
# Set a CRS if it doesn't already have one
if final_gdf.crs is None:
final_gdf.set_crs(epsg=4326, inplace=True) # WGS84 CRS
print(final_gdf)
final_gdf.to_file('finalized_outer_gdf.geojson',driver="GeoJSON")
Would we want the shoreline detection images to be copied to this new merged session? @dbuscombe-usgs
Nope.
@dbuscombe-usgs I've got a question about merging shorelines: what do we do if we have 2 overlapping shorelines for the same satellite and date? The images below are 2 shorelines I created where a portion of the shorelines overlap (its hard to see but the bottom red portion is overlapped by both shorelines, but in some of the area where these 2 shorelines exist they are not the same. As seen by the red spiked shoreline and the blue straight shoreline. The purple shoreline was created by merging the 2 shorelines with a shapely unary union operation, which just merged the 2 geometries together. I don't think we want this as it creates regions where 2 or more shorelines can exist for the same exact time.
We could average the the points across the shorelines. Basically if any of the shoreline points don't perfectly overlap for the same time date we create a new averaged shoreline out of the shorelines. Then use this new shoreline to intersect with the transects. If you have any ideas let me know.
Also I realized that the approach of averaging the intersection values for shoreline points that intersect with transects for the same satellite and date, won't work because the shoreline points are derived from the shoreline vectors. Therefore after merge the shoreline vectors would need to recompute the shoreline transect intersections for any new merged shorelines we created.
So I created a little bit of sample code to demonstrate the idea of merging the 2 shorelines together via an average technique. The result is this brown shoreline. However this strategy is not perfect as the new averaged shoreline is much shorter than the blue shoreline due to the new shoreline being created out of midpoints of the 2 shorelines. We could modify this approach to ensure the final shoreline is the length of the 2 component shorelines.
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import LineString
def resample_line(line, num_points):
"""Resample a LineString to have a specified number of evenly spaced points."""
distances = np.linspace(0, line.length, num_points)
return LineString([line.interpolate(distance) for distance in distances])
def average_shorelines(line1, line2, num_points):
"""Calculate the average shoreline between two LineStrings."""
# Resample the input lines
resampled_line1 = resample_line(line1, num_points)
resampled_line2 = resample_line(line2, num_points)
# Calculate the midpoints
midpoints = [((coord1[0] + coord2[0]) / 2, (coord1[1] + coord2[1]) / 2)
for coord1, coord2 in zip(resampled_line1.coords, resampled_line2.coords)]
return LineString(midpoints)
# read the shorelines from the file
import geopandas as gpd
gdf = gpd.read_file(r'C:\development\doodleverse\coastseg\CoastSeg\overlapping_shorelines.geojson', driver='GeoJSON')
# create a new interpolated shorelines using 20 points
new_linestring=average_shorelines(gdf.iloc[0][0],gdf.iloc[1][0],20)
# save the new averaged shoreline to a geojson file
new_gdf = gpd.GeoDataFrame(geometry=[new_linestring])
new_gdf.to_file('merged_averaged_overlapping_shorelines.geojson')
So I did a bit more thinking on this idea. This issue will arise when 2 or more ROIs overlap, so we should be able to isolate the portions of the shorelines that have the potential to overlap, then average just the shorelines in that region.
Sorry for late response - yes, I like the last idea you had about averaging those overlapped shorelines, then appending that averaged segment to the rest of the shoreline from the two ROIs. Thanks for presenting these options!
All good, I'll give that approach a shot and see how it performs. Also I'm pretty sure the approach will struggle with back barrier shorelines, so expect some wonky results for back barrier shorelines or regions where there are discontinuous shorelines.
Agree. This merge operation will be tricky for backbarrier shoreline workflows, which are tricky in general because inner shorelines tend to be far more complicated in 2d planform shape that outer coast ones. So much so that users should probably be encouraged to use coastseg only for the outer shorelines, where possible
I've been working on the script to merge extracted Shoreline sessions together. I want to leave a quick update while I keep working on it today. I've found that the method we use of averaging the points that make up each of the extracted shorelines Works moderately well when the shorelines are the same shape as you can see an example one. However if the shorelines are vastly different shapes as seen an example 2 the method falls apart, which means the user will need to filter out shorelines before they use this script.
I've found that not all the extracted shorelines are continuous single line strings. As you can see an example two which has multiline strings this complicates the averaging process as each line string in each multiline string has to be averaged together and depending on the orientation of each line string this can result in vastly different shapes.
To improve this method at some point we should probably take an account the distance each line string is from each other and find a way to average the line strings that are closest together first then the line strings that aren't. I think if we implement this it would improve the results.
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import LineString, MultiLineString
def resample_line(line, num_points):
"""Resample a LineString to have a specified number of evenly spaced points."""
distances = np.linspace(0, line.length, num_points)
return LineString([line.interpolate(distance) for distance in distances])
# Redefining the average_lines function
def average_lines(line1, line2, num_points):
"""Calculate the average LineString between two LineStrings."""
# Resample the input lines
resampled_line1 = resample_line(line1, num_points)
resampled_line2 = resample_line(line2, num_points)
# Calculate the midpoints
midpoints = [((coord1[0] + coord2[0]) / 2, (coord1[1] + coord2[1]) / 2)
for coord1, coord2 in zip(resampled_line1.coords, resampled_line2.coords)]
return LineString(midpoints)
def average_shorelines(geom1, geom2, num_points):
"""Calculate the average shoreline between two geometries."""
if isinstance(geom1, LineString) and isinstance(geom2, LineString):
return average_lines(geom1, geom2, num_points)
elif isinstance(geom1, MultiLineString) and isinstance(geom2, MultiLineString):
averaged_lines = []
for line1, line2 in zip(geom1.geoms, geom2.geoms):
averaged_line = average_lines(line1, line2, num_points)
averaged_lines.append(averaged_line)
return MultiLineString(averaged_lines)
elif isinstance(geom1, MultiLineString) and isinstance(geom2, LineString):
averaged_lines = []
for line in geom1.geoms:
averaged_line = average_lines(line, geom2, num_points)
averaged_lines.append(averaged_line)
return MultiLineString(averaged_lines)
elif isinstance(geom1, LineString) and isinstance(geom2, MultiLineString):
averaged_lines = []
for line in geom2.geoms:
averaged_line = average_lines(geom1, line, num_points)
averaged_lines.append(averaged_line)
return MultiLineString(averaged_lines)
else:
raise TypeError("Input geometries must be LineString or MultiLineString")
# create a new interpolated shorelines using 54 points (original number of points in the extracted shoreline)
# note: merged_gdf is a merged geodataframe performed on the 'date' column, 'geometry_x' and 'geometry_y' are the geometries from the geodataframes that were merged
new_linestring=average_shorelines(merged_gdf['geometry_x'].iloc[0],merged_gdf['geometry_y'].iloc[0],54)
new_linestring
# Re-importing required libraries
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import LineString, MultiLineString
# Select the geometries to be plotted
geom_x = gpd.GeoSeries(merged_gdf['geometry_x'].iloc[0])
geom_y = gpd.GeoSeries(merged_gdf['geometry_y'].iloc[0])
new_linestring_series = gpd.GeoSeries(new_linestring)
# Create a new plot
fig, ax = plt.subplots(figsize=(10, 10))
# Plot the geometries
geom_x.plot(ax=ax, color='blue', label='Geometry X')
geom_y.plot(ax=ax, color='green', label='Geometry Y')
new_linestring_series.plot(ax=ax, color='red', linestyle='--', linewidth=2, label='New Linestring')
# Add a legend
ax.legend()
# Set the aspect of the plot to be equal
ax.set_aspect('equal', adjustable='datalim')
# Save the plot to a PNG file
plt.savefig('averaged_shoreline.png', dpi=300)
plt.show()
I've made some headway on the merge sessions notebook for CoastSeg:
The purpose of the merge sessions notebook is to combine all the shorelines, ROIs, and transect data into a unified archive. This compiled data is not intended to be loaded into CoastSeg due to the complex dependencies and data references that would be compromised. To accommodate loading merged sessions into Coastseg, would necessitate extensive modifications to both CoastSeg and CoastSat. Such an overhaul would require in-depth consultation with users and a thorough review of all processes within both systems.
I think our resources are better spent getting the Planet API working and refining CoastSeg’s internal architecture. This approach will lay a stronger foundation, potentially simplifying the future incorporation of merged sessions.
test_case1_same_rois.zip test_case2_different_rois.zip test_case3_rois_overlapping_dates.zip test_case4_overlapping.zip
I made sample data to test merge_session.py with
To utilize the script in the development branch of CoastSeg, follow these instructions.
Begin by switching to the specified development branch:
git switch issue/179
Activate your coastseg
environment to ensure proper dependencies are in place:
conda activate coastseg
Install the CoastSeg package in editable mode. This will synchronize the CoastSeg version with the current branch's code.
pip install -e .
Note: The -e
flag is essential for installing the package in a way that reflects changes made in the branch.
Navigate to the scripts directory with the coastseg directory and open the merge_sessions.py
script in Visual Studio Code:
cd scripts
code merge_sessions.py
Consider merging two sessions with extracted shorelines into a single session. The following are the locations of 2 sessions from the test data provided.
r"C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac6_datetime10-30-23__01_44_50"
r"C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac1_datetime10-30-23__01_44_50"
-i
Specify the session locations separating each location with a space-s
Specify that new session directory should be saved at "merged_sessions" directory within the script directory-n
The merged session directory name will be "merged_session1"To execute, separate each session location with a space (use double quotes on Windows):
python merge_sessions.py -i "C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac6_datetime10-30-23__01_44_50" "C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac1_datetime10-30-23__01_44_50" -s "merged_sessions" -n "merged_session1"
Merge the same sessions but modify the min points
to 2 and the prc_multiple
to 0.3:
python merge_sessions.py -i "C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac6_datetime10-30-23__01_44_50" "C:\development\doodleverse\coastseg\CoastSeg\test_data\test_case4_overlapping\ID_gac1_datetime10-30-23__01_44_50" -s "merged_sessions" -n "merged_session1" -pm 0.3 -mp 2
Please follow these steps carefully to ensure successful execution of the script in the development branch of CoastSeg.
Often, shorelines from different ROIs may have the same timestamp, i.e. a second shoreline extracted from an adjacent ROI image with the same timestamp.
So, we need a way to take the raw shoreline data and merged based on date. pandas/geopandas
merge
command seems to do this, e.g.https://stackoverflow.com/questions/48231758/merge-pandas-dataframes-based-on-date
I think this might be best as a stand-alone script, at least initially while we test the workflow