GeoscienceAustralia / dea-notebooks

Repository for Digital Earth Australia Jupyter Notebooks: tools and workflows for geospatial analysis with Open Data Cube and Xarray
https://docs.dea.ga.gov.au/notebooks/
Apache License 2.0
433 stars 127 forks source link

Possible bug in get_coastlines() function for northern Australia #1212

Open eatanygoodbookslately opened 4 months ago

eatanygoodbookslately commented 4 months ago

Dear DEA Notebooks team

I hope you all are well. I’m a university research fellow and a happy user of DEA Coastlines for more than a year. I think I’ve discovered a bug in the 2021 Australian coastline data, specifically with the application of a bounding box to the data prior to download of the GeoDataFrame of line geometries:

myVariableName = get_coastlines(bbox=(xWest,ySouth,xEast,yNorth),crs="EPSG:4283")

According to the well-known GDA94 coordinate reference system (described for EPSG:4283 units of lat/long degrees here), the virtual borders to this dataset should be as below (the variable names are just indicative, they’re not actual keywords for the get_coastlines ‘bbox’ argument)

-xEast = 93.41 #degrees latitude -xWest = 173.34 #degrees latitude -ySouth = -60.55 #degrees longitude -yNorth = -8.47 #degrees longitude

The apparent bug is that when yNorth is set to its EPSG:4283 value (-8.47), the northern extents of Australia become cropped out (see below). Only when yNorth is set to an unrealistically positive-tending value (zero degrees, i.e. equatorial) does the ‘get_coastlines()’ function eliminate its erroneous cropping (see below):

I was wondering if someone on the DEA Notebooks team could comment on this. To reproduce this error in Python, you can run the below Python code from within a DEA Sandbox notebook (not sure if I can attach JupyterLab notebooks here so I converted it into a Python script below - this can be run by pasting it into a single cell or splitting it at the "# SECTION" points).

# SECTION 1 - # Import & test modules
#IMPORT PYTHON STANDARD LIBRARY MODULES
import os,sys,pickle

#IMPORT COMMON EXTERNAL PYTHON PACKAGES
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd

#IMPORT GIS PYTHON PACKAGES
import geopandas as gpd
sys.path.insert(1,'../Tools/')
import dea_tools
from dea_tools.coastal import get_coastlines

print("IMPORT: matplotlib version =", mpl.__version__)
print("IMPORT: pandas version =", pd.__version__)
print("IMPORT: geopandas version =", gpd.__version__)
print("IMPORT: dea_tools version =", dea_tools.__version__)

# SECTION 2 - Coastline data import
#Specify bounding box using CRS EPSG:4283 (degrees, GDA94)
#Source of bbox limits below was [https://epsg.io/4283] as at 2024/03/17
#Metadata states "Australia including Lord Howe Island, Macquarie Island, Ashmore and Cartier Islands, Christmas Island, Cocos (Keeling) Islands, Norfolk Island. All onshore and offshore."
#Most recent computational runtime for this (slowest in the notebook) cell was up to 14 minutes on 17th March 2024

xEast = 93.41
xWest = 173.34
ySouth = -60.55
yNorth = -8.47 #Clips off the northern extents of Australia

#Import DEA Coastlines dataset within an Australia-wide bounding box
deacl_annualshorelines_gdf = get_coastlines(bbox=(xEast,ySouth,xWest,yNorth),crs="EPSG:4283")

#Print dataset metadata (EPSG:3577 is GDA94 in metres)
print(f"PROGRESS: Dtype of 'year' column is {deacl_annualshorelines_gdf['year'].dtype}")
print(f"PROGRESS: CRS of dataset is {deacl_annualshorelines_gdf.crs}")

# SECTION 3 - Data visualisation
#Plot the matching GDF rows
deacl_shorelines_gdf_year_needed.to_crs(epsg=4283).plot();

I was advised to try the Web Feature Service interface to the DEA Coastlines dataset and fortunately found this is working just fine

#Specify bounding box to download coastline for Australia (lat/long degrees in GDA94 are via EPSG:4283)
#Last runtime for this cell was ~5 minutes
ymax,xmin = -8.47,93.41
ymin,xmax = -60.55,173.34

#Set up WFS requests for annual shorelines (WGS84 in degrees is EPSG:4326)
#Info is at "https://knowledge.dea.ga.gov.au/data/product/dea-coastlines/?tab=access"
deacl_annualshorelines_wfs = f'https://geoserver.dea.ga.gov.au/geoserver/wfs?' \
                       f'service=WFS&version=1.1.0&request=GetFeature' \
                       f'&typeName=dea:shorelines_annual' \
                       f'&bbox={ymin},{xmin},{ymax},{xmax},' \
                       f'urn:ogc:def:crs:EPSG:4326'

#Load DEA Coastlines data from WFS
deacl_annualshorelines_gdf = gpd.read_file(deacl_annualshorelines_wfs)

#Ensure CRS is set correctly
deacl_annualshorelines_gdf.crs = 'EPSG:3577'

#Isolate a year and plot it
deacl_annualshorelines_2021_gdf = deacl_annualshorelines_gdf[deacl_annualshorelines_gdf['year']==2021.0]
deacl_annualshorelines_2021_gdf.plot();

Thank you for your time and my apologies if I've omitted any information (this is actually my first open-source bug report!)