geopandas / dask-geopandas

Parallel GeoPandas with Dask
https://dask-geopandas.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
503 stars 44 forks source link

Method directly calls PyGEOS function, but GeoPandas is deprecating PyGEOS for Shapely #226

Closed gtmaskall closed 1 year ago

gtmaskall commented 1 year ago

New GeoPandas issues warning about PyGEOS:

/tmp/ipykernel_24453/103584512.py:4: UserWarning: Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas will still use PyGEOS by default for now. To force to use and test Shapely 2.0, you have to set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gp

So, being the type to look to the future, I set USE_PYGEOS to '0'. But I'm following the dask-geopandas tutorial on spatial partitioning, and calling the calculate_spatial_partitions method on my dask geodataframe produces:

ValueError: Metadata inference failed in `lambda`.

You have supplied a custom function and Dask is unable to 
determine the type of output that that function returns. 

To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.

Original error is below:
------------------------
TypeError('One of the arguments is of incorrect type. Please provide only Geometry objects.')

Traceback:
---------
  File "/home/guy/anaconda3/envs/geocube/lib/python3.10/site-packages/dask/dataframe/utils.py", line 182, in raise_on_meta_error
    yield
  File "/home/guy/anaconda3/envs/geocube/lib/python3.10/site-packages/dask/dataframe/core.py", line 6375, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "/home/guy/anaconda3/envs/geocube/lib/python3.10/site-packages/dask_geopandas/core.py", line 140, in <lambda>
    pygeos.geometrycollections(part.geometry.values.data)
  File "/home/guy/anaconda3/envs/geocube/lib/python3.10/site-packages/pygeos/decorators.py", line 80, in wrapped
    return func(*args, **kwargs)
  File "/home/guy/anaconda3/envs/geocube/lib/python3.10/site-packages/pygeos/creation.py", line 467, in geometrycollections
    return lib.create_collection(geometries, typ, out=out, **kwargs)
And looking in the traceback, there's clear direct call of pygeos function:
File ~/anaconda3/envs/geocube/lib/python3.10/site-packages/dask_geopandas/core.py:140, in _Frame.calculate_spatial_partitions.<locals>.<lambda>(part)
    135 # TEMP method to calculate spatial partitions for testing, need to
    136 # add better methods (set_partitions / repartition)
    137 parts = geopandas.GeoSeries(
    138     self.map_partitions(
    139         lambda part: pygeos.convex_hull(
--> 140             pygeos.geometrycollections(part.geometry.values.data)
    141         )
    142     ).compute(),
    143     crs=self.crs,
    144 )
    145 self.spatial_partitions = parts

Is there a plan to migrate dask-geopandas to use shapely?

martinfleis commented 1 year ago

Is there a plan to migrate dask-geopandas to use shapely?

Yes, for sure, but at the moment dask-geopandas depends on pygeos directly so not everything will work right now when you override the geometry engine. This will need a bit of careful release logic to ensure dask-geopandas still works with the changes on geopandas side.

jorisvandenbossche commented 1 year ago

This was closed by https://github.com/geopandas/dask-geopandas/issues/226, and this fix is included in the just released version 0.3.0.