PermafrostDiscoveryGateway / viz-staging

PDG Visualization staging pipeline
Apache License 2.0
2 stars 1 forks source link

Using WorldCRS84Quad TMS gives error: Cannot determine common CRS for concatenation inputs #11

Closed robyngit closed 1 year ago

robyngit commented 1 year ago

This issue appeared after we updated some packages, where suddenly the concatenation step in the TileStager was giving the error:

Cannot determine common CRS for concatenation inputs, got ['WGS 84 (CRS84)', 'WGS 84 (CRS84)']. Use to_crs() to transform geometries to the same CRS before merging.

This issue was originally outlined in https://github.com/PermafrostDiscoveryGateway/viz-workflow/issues/10, and as Juliet documented, it's related to geopandas raising a new error in v0.12+ if two GDFs have differing CRSs during concatenation.

After some more investigation, it seems that some of the CRS information is being lost for certain CRSs when we save tiles. When we re-open those tiles to merge with new polygons and deduplicate, the CRSs differ slightly. Here is a minimal example:

import geopandas as gpd
import pandas as pd
import morecantile

input_filename = 'file1.gpkg'
reproj_filename = 'file1_tmsCRS.gpkg'
tms = morecantile.tms.get('WorldCRS84Quad')

# Read in any file
gdf = gpd.read_file(input_filename)
# Re-project to the CRS of the TMS
gdf_tmsCRS = gdf.to_crs(tms.crs)
# Save this reprojected GeoDataFrame to a geopackage file
gdf_tmsCRS.to_file(reproj_filename)
# Read in the reprojected file we just saved
gdf_tmsCRS_reopened = gpd.read_file(reproj_filename)
# Concatenate the version that was saved, & re-opened, with the one that wasn't, see error
pd.concat([gdf_tmsCRS, gdf_tmsCRS_reopened])

Compare the CRSs of the files before and after we save to file:

Before save

gdf_tmsCRS.crs
Name: WGS 84 (CRS84)
Axis Info [ellipsoidal]:
- Lon[east]: Geodetic longitude (degree)
- Lat[north]: Geodetic latitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

After save & reopen

gdf_tmsCRS_reopened.crs
Name: WGS 84 (CRS84)
Axis Info [ellipsoidal]:
- Lon[east]: Geodetic longitude (degree)
- Lat[north]: Geodetic latitude (degree)
Area of Use:
- undefined
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

The Area of Use and Datum do not match.

If we instead run the minimal example above with the TMS WGS1984Quad instead of WorldCRS84Quad, then the CRSs match and we are able to concatenate. The WGS1984Quad TMS uses EPSG:4326, while the the WorldCRS84Quad TMS uses the "Equirectangular Plate Carrée projection in the CRS84 CRS for the whole world" (https://docs.opengeospatial.org/is/17-083r2/17-083r2.html)

morecantile.tms.get('WorldCRS84Quad').crs.list_authority()
# [AuthorityMatchInfo(auth_name='OGC', code='CRS84', confidence=100)]
morecantile.tms.get('WGS1984Quad').crs.list_authority()
# [AuthorityMatchInfo(auth_name='EPSG', code='4326', confidence=100)]

I suppose that geopandas or its underlying packages do not fully support the projection used by WorldCRS84Quad. Cesium handles tiles made with either of these TMS's identically, and output also appears identical, therefore I suggest that we favour using WGS1984Quad. (@julietcohen @KastanDay). I will add a check before we concatenate to ensure that CRSs are the same, and reset to the TMS's CRS if needed so that we avoid this error.