geopandas / dask-geopandas

Parallel GeoPandas with Dask
https://dask-geopandas.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
505 stars 45 forks source link

AttributeError: 'Series' (or `DataFrame`) object has no attribute 'to_crs' when using dask_geopandas to reproject to new CRS #315

Open dluks opened 1 week ago

dluks commented 1 week ago

Description

I am experiencing an issue when trying to reproject a dask_geopandas.GeoSeries or dask_geopandas.GeoDataFrame to a new CRS. The set_crs and to_crs methods return an AttributeError indicating that the Series or DataFrame object has no attribute set_crs or to_crs.

I'm guessing this has to do with the fact that the geometry was created using dask_geopandas.points_from_xy, though even when I compute the GeoDataFrame, save it to disk, and re-read it using dask_geopandas.read_parquet, the issue still persists.

Code to reproduce

import pandas as pd
import dask.dataframe as dd
import dask_geopandas as dgpd

df = pd.DataFrame(
    {
        "value": [1, 2, 3],
        "decimallatitude": [34.05, 36.16, 40.71],
        "decimallongitude": [-118.24, -115.15, -74.00],
    }
)

ddf = dd.from_pandas(df, npartitions=2)

geom = dgpd.points_from_xy(
    ddf, "decimallongitude", "decimallatitude", crs="EPSG:4326"
)

ddf["geometry"] = geom

ddf = dgpd.from_dask_dataframe(ddf[["value", "geometry"]])

geom = geom.to_crs("EPSG:6933")  # <-- returns AttributeError
ddf  = ddf.to_crs("EPSG:6933")  # <-- also returns AttributeError 

Error Message

The following error is raised when attempting to use to_crs on geom alone:

AttributeError: 'Series' object has no attribute 'to_crs'

And on ddf:

AttributeError: 'DataFrame' object has no attribute 'to_crs'

Expected Behavior

I expect the to_crs method to reproject either a dask_geopandas.GeoSeries or dask_geopandas.GeoDataFrame to the new CRS without raising an error.

System Information

python: 3.11 dask-geopandas: 0.3.1 dask: 2024.9.0 pandas: 2.2.2 OS: Ubuntu 22.04.4 LTS

dluks commented 1 week ago

Here's the full stack trace:

---------------------------------------------------------------------------
AttributeError                          Traceback (most recent call last)
File ~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:470, in Expr.__getattr__(self, key)
    [469](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:469) try:
--> [470](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:470)     return object.__getattribute__(self, key)
    [471](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:471) except AttributeError as err:

AttributeError: 'MapPartitions' object has no attribute 'set_crs'

During handling of the above exception, another exception occurred:

AttributeError                          Traceback (most recent call last)
File ~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:620, in FrameBase.__getattr__(self, key)
    [617](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:617) try:
    [618](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:618)   # Fall back to `expr` API
    [619](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:619)   # (Making sure to convert to/from Expr)
--> [620](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:620)   val = getattr(self.expr, key)
    [621](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:621)   if callable(val):

File ~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:491, in Expr.__getattr__(self, key)
    [490](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:490) link = "https://github.com/dask-contrib/dask-expr/blob/main/README.md#api-coverage"
--> [491](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:491) raise AttributeError(
    [492](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:492)     f"{err}\n\n"
    [493](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:493)     "This often means that you are attempting to use an unsupported "
    [494](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:494)     f"API function. Current API coverage is documented here: {link}."
    [495](/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_core.py:495) )

AttributeError: 'MapPartitions' object has no attribute 'set_crs'

This often means that you are attempting to use an unsupported API function. Current API coverage is documented here: https://github.com/dask-contrib/dask-expr/blob/main/README.md#api-coverage.

During handling of the above exception, another exception occurred:

AttributeError                          Traceback (most recent call last)
Cell In[41], [line 1](vscode-notebook-cell:?execution_count=41&line=1)
----> [1](vscode-notebook-cell:?execution_count=41&line=1) ddf.set_crs(4326)

File ~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:3076, in DataFrame.__getattr__(self, key)
   [3073](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:3073)     raise err
   [3074](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:3074) except AttributeError:
   [3075](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:3075)     # Fall back to `BaseFrame.__getattr__`
-> [3076](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:3076)     return super().__getattr__(key)

File ~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:626, in FrameBase.__getattr__(self, key)
    [623](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:623)  return val
    [624](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:624) except AttributeError:
    [625](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:625)  # Raise original error
--> [626](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:626)  raise err

File ~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:615, in FrameBase.__getattr__(self, key)
    [612](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:612) def __getattr__(self, key):
    [613](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:613)  try:
    [614](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:614)      # Prioritize `FrameBase` attributes
--> [615](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:615)      return object.__getattribute__(self, key)
    [616](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:616)  except AttributeError as err:
    [617](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:617)      try:
    [618](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:618)          # Fall back to `expr` API
    [619](https://vscode-remote+ssh-002dremote-002bpylos.vscode-resource.vscode-cdn.net/path/to/proj/~/miniforge3/envs/traits-py311/lib/python3.11/site-packages/dask_expr/_collection.py:619)          # (Making sure to convert to/from Expr)

AttributeError: 'DataFrame' object has no attribute 'set_crs'
TomAugspurger commented 1 week ago

Thanks for the report. Looks like we just forgot to pass the crs through at https://github.com/geopandas/dask-geopandas/blob/33e2af89318173c4b3b403e5f3b430fd3fcf88db/dask_geopandas/expr.py#L912-L914. This diff seems to fix it

diff --git a/dask_geopandas/core.py b/dask_geopandas/core.py
index b448319..4ef455c 100644
--- a/dask_geopandas/core.py
+++ b/dask_geopandas/core.py
@@ -878,7 +878,7 @@ def points_from_xy(df, x="x", y="y", z="z", crs=None):
         )

     return df.map_partitions(
-        func, x, y, z, meta=geopandas.GeoSeries(), token="points_from_xy"
+        func, x, y, z, meta=geopandas.GeoSeries(crs=crs), token="points_from_xy"
     )

diff --git a/dask_geopandas/expr.py b/dask_geopandas/expr.py
index a14e8bc..dcdafa8 100644
--- a/dask_geopandas/expr.py
+++ b/dask_geopandas/expr.py
@@ -910,7 +910,7 @@ def points_from_xy(df, x="x", y="y", z="z", crs=None):
         )

     return df.map_partitions(
-        func, x, y, z, meta=geopandas.GeoSeries(), token="points_from_xy"
+        func, x, y, z, meta=geopandas.GeoSeries(crs=crs), token="points_from_xy"
     )

Let us know if you're interested in making a PR with that change and a test.

dluks commented 1 week ago

Sure, I'm happy to take a stab at it when I have some time in the next few days.