geopandas / dask-geopandas

Parallel GeoPandas with Dask
https://dask-geopandas.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
486 stars 45 forks source link

Register arrow import/export dispatch to make p2p shuffle work #295

Closed jorisvandenbossche closed 2 months ago

jorisvandenbossche commented 3 months ago

This registers our own implementation for pyarrow table <-> (Geo)DataFrame conversion through the dispatch mechanism. The distributed p2p shuffle algorithm uses that under the hood (it converts to pyarrow to send the data to other workers), and so with this change (and with the latest distributed release, as we require the fix from https://github.com/dask/distributed/pull/8667), a distributed spatial_shuffle() now works.

jorisvandenbossche commented 2 months ago

Going to merge this, as it is required for me to get https://github.com/geopandas/dask-geopandas/pull/165 running with the latest versions of dask/distributed