geopandas / dask-geopandas

Parallel GeoPandas with Dask
https://dask-geopandas.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
498 stars 44 forks source link

ENH: preserve spatial partitioning information in more methods #63

Open jorisvandenbossche opened 3 years ago

jorisvandenbossche commented 3 years ago

We already preserve the spatial partitioning information (spatial_partitions attribute) in several places (eg when selecting a subset of the columns in __getitem__, in the boundary attribute, with a _propagate_spatial_partitions helper method). But there are more places where it could either be preserved as is, or preserved in a slightly modified form.

Methods where it can be preserved as is:

Methods where it might be relatively straightforward to preserve it in a slightly modified form:

martinfleis commented 2 years ago

One to add - we lose spatial partitions after to_crs. I am just not sure if we can just reproject the partitions GeoSeries or we risk that some points will fall outside in some corner cases.

jorisvandenbossche commented 2 years ago

I think that in general, that is not guaranteed to work. Basically if reprojected lines are no longer straight lines (which happens for many conversions), a reprojected bounding box will not necessarily contain all points anymore. Quick example:

import shapely
box = shapely.box(0, 40, 20, 60)
poly = shapely.segmentize(poly, 5)

reprojected = geopandas.GeoSeries([box, poly], crs="EPSG:4326").to_crs("EPSG:3035")

from shapely.plotting import plot_polygon
plot_polygon(reprojected[0])
plot_polygon(reprojected[1], color="C1")

crs