geopandas / dask-geopandas

Parallel GeoPandas with Dask
https://dask-geopandas.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
486 stars 45 forks source link

dtype('O') not supported since geopandas 0.13.0 #255

Open tmillenaar opened 1 year ago

tmillenaar commented 1 year ago

Hey all,

I noticed that dask_geopandas.from_dask_array returns an object on which I cannot do operations like to_crs and set_crs. The error I get is: AttributeError: 'DataFrame' object has no attribute '_meta_nonempty'

It turns out this is because my Dask DrataFrame holds shapely objects of which the dtype is dtype('O'). This does work if I use geopandas 0.12.2.

A minimal example to reproduce:

import dask, pandas, geopandas, dask_geopandas, shapely
dask.__version__ # 2023.5.1
pandas.__version__ # 2.0.2
geopandas.__version__ # 0.13.0
dask_geopandas.__version__ # v0.3.1
shapely.__version__ # 2.0.1

points = [
    shapely.geometry.Point(0,0),
    shapely.geometry.Point(1,1),
]

df = pandas.DataFrame({"geometry": points})
ddf = dask.dataframe.from_pandas(df, npartitions=1)
dgdf = dask_geopandas.from_dask_dataframe(ddf)

# now set_crs and to_crs don't work
ddf.geometry.dtype # dtype('O')
hasattr(dgdf, "_meta_nonempty") # False
dgdf = dgdf.set_crs(4326) # AttributeError: 'DataFrame' object has no attribute '_meta_nonempty'

# note that the dtype of ddf.geometry is 'object' here
# What does work:
ddf = ddf.astype(geopandas.array.GeometryDtype())
dgdf = dask_geopandas.from_dask_dataframe(ddf)
dgdf = dgdf.set_crs(4326)

Going forward I will happily use ddf.astype(geopandas.array.GeometryDtype()) on my array of shapely objects. If dtype('O') was never officially supported and GeometryDtype is the intended approach, you can close this thicket.

If instead you do want to continue to support and you require more help or info, let me know.

Cheers, Timo