geopandas / dask-geopandas

Parallel GeoPandas with Dask
https://dask-geopandas.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
486 stars 45 forks source link

Support latest dask.dataframe with query planning (dask-expr) #284

Open jorisvandenbossche opened 5 months ago

jorisvandenbossche commented 5 months ago

The 2024.2.0 release of Dask has deprecated the "legacy" dask.dataframe API. They are building a new API with "Query Planning" based on new high-level expressions (instead of the current HighLevelGraph), which currently lives in the dask-expr package. It can be enabled through dask.config.set({'dataframe.query-planning': True}).

The upcoming 2024.3.0 release (https://github.com/dask/community/issues/366) plans to enable this by default (but you can still turn it off for now).

Given that dask-geopandas is tightly integrated with dask.dataframe, we will have to update to use the new dask_expr backend. For released versions of dask-geopandas, users will have to upper pin dask or disable the option for now.

(for reference, a similar issue on the cudf side https://github.com/rapidsai/cudf/issues/15027 (they also have a subclass), and their PR to add basic support: https://github.com/rapidsai/cudf/pull/14805)

ReptarK commented 4 months ago

Hi, any updates on the integration of dask_expr ?

jorisvandenbossche commented 4 months ago

An initial patch supporting dask with query planning was merged in https://github.com/geopandas/dask-geopandas/pull/285. So for now using the latest dask-geopandas from the main branch (not yet released, so something like pip install git+https://github.com/geopandas/dask-geopandas.git) should work with the latest dask releases.

jorisvandenbossche commented 2 months ago

This preliminary support has been released in dask-geopandas 0.4.0 (https://github.com/geopandas/dask-geopandas/releases/tag/v0.4.0)