Closed timdhu closed 2 months ago
Thanks for the report. Do you happen to have a minimal reproducible example you can share?
Sure. I created a new environment
python3.11 -m venv env
Then activated it
source env/bin/activate
Installed dask-geopandas
and any dependencies
pip install dask-geopandas
Then I run the following code in the environment:
from dask_geopandas import from_geopandas
from geopandas import GeoDataFrame
from shapely import box
shape = box(-74.5, -74.0, 4.5, 5.0)
shape = GeoDataFrame(geometry=[shape])
shape = from_geopandas(shape, npartitions=1)
shape = shape.spatial_shuffle()
shape.sjoin(shape).compute()
which produces the error I mentioned above. I'm running this on an Apple M1X, but I get the same issue in a linux docker container that is built remotely.
I can reproduce it. The tuple in question is ('set_geometry-49fab61a205e66c3ff77166c562d931f', 0)
.
Having the same issue here +1! Thanks for looking into it!
I would like to contribute in fixing this- can anyone guide me a little? I already have a running failing test. Much appreciated!
I was hoping to look into this sometime this week too. I think two paths forward are to
(name, partition_number)
isn't being properly converted to a data value.I'm not sure which of these is more promising.
https://github.com/dask/dask-expr/issues/1129 has the reproducer. The key was a .shuffle()
call (without that .shuffle
or .spatial_shuffle
, it doesn't reproduce the error).
Hello,
I'm getting an error when I try to use
sjoin
after usingspatial_shuffle
.Here are the packages in my environment:
(I ran
pip install dask-geopandas
andpip install geodatasets
in a fresh venv)Then I run the following code:
and get the following error:
If I remove the
spatial_shuffle
then the code runs as expected.I've dug around a bit to try and understand what's going on. If I disable query planning from
dask
before I importdask-geopandas
then the code runs:It looks like something in the condition in line 117 of
sjoin.py
is causing an issuehttps://github.com/geopandas/dask-geopandas/blob/d84e29902b1ec43522c397f8086eebf1ec90182d/dask_geopandas/sjoin.py#L117