Closed ebkurtz closed 6 months ago
Failing on GeoSeries since the Column
argument is only valid for DataFrame.
Can you define it on Series
and DataFrame
separately like in Dask?
The CI failure looks unrelated. I'll take a look at that later.
It might be worth adding a small unit test, just to verify that things are passing through correctly (and to make sure they don't break in the future).
Thanks for the tip on defining them separately. I'll look at creating a unit test.
Would I add that as a new function in test_core.py
are as additional code in the existing functions? I haven't done much with unit tests.
Yes, a new function. You might be able to adapt a geopandas test like https://github.com/geopandas/geopandas/blob/dc695ee93113af5d1a93a5740ecd792c2b9d19b3/geopandas/tests/test_geom_methods.py#L1003.
The unit tests for the ignore_index
argument are failing. With ignore_index
set to True
, the index is reset for each map partition so the resulting GeoDataFrame
index has a non-unique index that doesn't match the vanilla GeoPandas call:
e.g. with
def test_explode_geoseries_ignore_index():
s = geopandas.GeoSeries(
[MultiPoint([(0, 0), (1, 1)]), MultiPoint([(2, 2), (3, 3), (4, 4)])]
)
# Change index from range index to test ignore index argument
s.index = [3, 4]
original = s.explode(ignore_index=True)
dask_s = dask_geopandas.from_geopandas(s, npartitions=2)
daskified = dask_s.explode(ignore_index=True)
assert isinstance(daskified, dask_geopandas.GeoSeries)
assert all(original == daskified.compute())
returns error
ValueError: Can only compare identically-labeled Series objects
self = 0 POINT (0.00000 0.00000)
1 POINT (1.00000 1.00000)
2 POINT (2.00000 2.00000)
3 POINT (3.00000 3.00000)
4 POINT (4.00000 4.00000)
dtype: geometry
other = 0 POINT (0.00000 0.00000)
1 POINT (1.00000 1.00000)
0 POINT (2.00000 2.00000)
1 POINT (3.00000 3.00000)
2 POINT (4.00000 4.00000)
My inclination is to reset the index after calling compute on the Dask geoseries:
assert all(original == daskified.compute().reset_index(drop=True))
but I'm not familiar enough with unit testing and dask to know if that's an appropriate way to handle it.
Getting a lot of test failures after refreshing my fork. Looks like something with pyarrow strings?
Getting a lot of test failures after refreshing my fork. Looks like something with pyarrow strings?
Apologies for the slow follow-up. Indeed those failures were cased by something else, which has been fixed in the meantime.
I also added the equivalent change to the new dask-expr based implementation.
The unit tests for the
ignore_index
argument are failing.
Your fix to create the expected result manually with the different index was still failing for the older geopandas or dask version. But so just added a skip for those older versions, as I don't think it is worth complicating the test for that.
@ebkurtz Thanks again for the PR and apologies it took so long to get merged!
Expose
index_parts
,column
, andignore_index
parameters in the.explode()
method to mirror GeoPandas API. This allows supressing FutureWarning forindex_parts
, see #245