Open gcaria opened 2 years ago
Hey i did some exploring and found out some interesting stuff.
import geopandas as gpd
import numpy as np
import dask_geopandas
from shapely.geometry import box
poly = box(1,1,3,3)
N = 10_000_000
points = gpd.GeoDataFrame(geometry=gpd.points_from_xy(np.random.randn(N),np.random.randn(N)))
dpoints = dask_geopandas.from_geopandas(points, npartitions=16)
type(dpoints.within(poly))
# dask.dataframe.core.Series
type(dpoints[dpoints.within(poly).compute()])
# dask_geopandas.core.GeoDataFrame
spoints = gpd.GeoSeries(gpd.points_from_xy(np.random.randn(N),np.random.randn(N)), crs=4326)
gsdpoints = dask_geopandas.from_geopandas(spoints, npartitions=16)
type(gsdpoints.within(poly))
# dask.dataframe.core.Series
type(gsdpoints[dpoints.within(poly).compute()])
# dask.dataframe.core.Series
This only occurs when dealing with GeoSeries, as GeoDataFrame is working as expected. I found out something else too and it is that GeoSeries gets transformed back to dask.series even without boolean indexing but with some operation done on the series.
type(gsdpoints.apply(lambda x: x, meta=(None, 'geometry')))
# dask.dataframe.core.Series
I have tried to find something inside the _Frame super class, with no results. Maybe here we need someone from the core team to check it out. I'm using dask_geopandas==0.2.0
I agree. I wonder if the fix is as easy as what's done in geopandas
to avoid falling back to pandas
objects (see here)
def _wrapped_pandas_method(self, mtd, *args, **kwargs):
"""Wrap a generic pandas method to ensure it returns a GeoSeries"""
val = getattr(super(), mtd)(*args, **kwargs)
if type(val) == Series:
val.__class__ = GeoSeries
val.crs = self.crs
return val
def __getitem__(self, key):
return self._wrapped_pandas_method("__getitem__", key)
This makes sense, Geopandas has no wrapping method for GeoDataFrames, but it does, as you show, for GeoSeries.
I would be willing to work on a PR that addresses this problem in the coming days.
The title might not be accurate since it is mostly my guess at what's going on here, but I would expect to still have a
dask-geopandas
object after indexing.I'm using version 0.2.0 of
dask-geopandas
.