geopandas / dask-geopandas

Parallel GeoPandas with Dask
https://dask-geopandas.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
486 stars 45 forks source link

AttributeError: 'DataFrame' object has no attribute 'within' #289

Open TommasoCanc opened 4 months ago

TommasoCanc commented 4 months ago

Dear all,

Could someone help me resolve the error AttributeError: 'DataFrame' object has no attribute 'within'?

I'm attempting to replicate the example from https://hamedalemo.github.io/advanced-geo-python/lectures/dask_geopandas_intro.html. However, when I execute the line dpoints.within(box), Python raises the following error: AttributeError: 'DataFrame' object has no attribute 'within'.

Cheers, Tom

TomAugspurger commented 4 months ago

What version of dask (and dask-expr, maybe) do you have installed?

TommasoCanc commented 4 months ago

Here I attached the pack versions.

dask 2024.4.1 dask-geopandas 0.3.1

THK!

anastassiavybornova commented 4 months ago

I'm running into a similar issue, just trying to run the example code from the dask-geopandas docs:

import geopandas
import dask_geopandas

df = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))
dask_df = dask_geopandas.from_geopandas(df, npartitions=4)

dask_df.geometry.area.compute()

throws the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:459](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:459), in Expr.__getattr__(self, key)
    [458](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:458) try:
--> [459](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:459)     return object.__getattribute__(self, key)
    [460](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:460) except AttributeError as err:

AttributeError: 'Projection' object has no attribute 'area'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
File [~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:618](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:618), in FrameBase.__getattr__(self, key)
    [615](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:615) try:
    [616](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:616)     # Fall back to `expr` API
    [617](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:617)     # (Making sure to convert to/from Expr)
--> [618](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:618)     val = getattr(self.expr, key)
    [619](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:619)     if callable(val):

File [~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:480](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:480), in Expr.__getattr__(self, key)
    [479](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:479) link = "https://github.com/dask-contrib/dask-expr/blob/main/README.md#api-coverage"
--> [480](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:480) raise AttributeError(
    [481](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:481)     f"{err}\n\n"
    [482](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:482)     "This often means that you are attempting to use an unsupported "
    [483](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:483)     f"API function. Current API coverage is documented here: {link}."
...
    [615](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:615)         try:
    [616](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:616)             # Fall back to `expr` API
    [617](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:617)             # (Making sure to convert to/from Expr)

AttributeError: 'Series' object has no attribute 'area'

The above is running a Jupyter notebook with a conda environment within VS code. My package versions:

dask                      2024.4.2           pyhd8ed1ab_0    conda-forge
dask-core                 2024.4.2           pyhd8ed1ab_0    conda-forge
dask-expr                 1.0.12             pyhd8ed1ab_0    conda-forge
dask-geopandas            0.3.1              pyhd8ed1ab_1    conda-forge
geopandas                 0.14.3             pyhd8ed1ab_0    conda-forge
anastassiavybornova commented 4 months ago

I get comparable error messages (attribute errors from dask-expr) when I try to apply to within or sjoin tools on a dask dataframe created with dask_geopandas.from_geopandas(geopandas.GeoDataFrame). For example:

import numpy as np
import geopandas
import dask_geopandas
N = 100
points1 = dask_geopandas.from_geopandas(
    geopandas.GeoDataFrame(
        geometry=geopandas.points_from_xy(
            np.random.randn(N),np.random.randn(N)
            )
        )
    )
points2 = dask_geopandas.from_geopandas(
    geopandas.GeoDataFrame(
        geometry=geopandas.points_from_xy(
            np.random.randn(N),np.random.randn(N)
            )
        )
    )
points1.sjoin(points2)

throws the error message

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:459](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:459), in Expr.__getattr__(self, key)
    [458](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:458) try:
--> [459](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:459)     return object.__getattribute__(self, key)
    [460](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:460) except AttributeError as err:

AttributeError: 'FromPandas' object has no attribute 'sjoin'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
File [~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:618](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:618), in FrameBase.__getattr__(self, key)
    [615](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:615) try:
    [616](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:616)     # Fall back to `expr` API
    [617](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:617)     # (Making sure to convert to/from Expr)
--> [618](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:618)     val = getattr(self.expr, key)
    [619](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:619)     if callable(val):

File [~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:480](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:480), in Expr.__getattr__(self, key)
    [479](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:479) link = "https://github.com/dask-contrib/dask-expr/blob/main/README.md#api-coverage"
--> [480](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:480) raise AttributeError(
    [481](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:481)     f"{err}\n\n"
    [482](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:482)     "This often means that you are attempting to use an unsupported "
    [483](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_core.py:483)     f"API function. Current API coverage is documented here: {link}."
...
    [615](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:615)         try:
    [616](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:616)             # Fall back to `expr` API
    [617](https://file+.vscode-resource.vscode-cdn.net/Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/socio-spatial-git/code/~/anaconda3/envs/hightest/lib/python3.12/site-packages/dask_expr/_collection.py:617)             # (Making sure to convert to/from Expr)

AttributeError: 'DataFrame' object has no attribute 'sjoin'

not sure if the error might be in how i'm trying to use sjoin here? but the same code that now throws these error messages worked with older package versions, in particular:

>>> geopandas.__version__
‘0.11.1’
>>> dask.__version__
‘2022.7.1’
>>> dask_geopandas.__version__
‘v0.2.0’
martinfleis commented 4 months ago

The dask-expr support is in a PR state (#285) so I would not expect dask-geopandas stable release to work with that without issues. You should be able to maually fall back to the old engine using dask.config.set({'dataframe.query-planning': False}).

anastassiavybornova commented 4 months ago

@martinfleis thanks for this!! but same error message (and the config is already set to False at import from what I can see), i.e.:

import dask
import geopandas
import dask_geopandas

print(dask.config.config["dataframe"]["query-planning"])
dask.config.set({'dataframe.query-planning': False})
print(dask.config.config["dataframe"]["query-planning"])

df = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))
dask_df = dask_geopandas.from_geopandas(df, npartitions=4)

dask_df.geometry.area.compute()

... prints out False twice, and then throws the same error message as above. By any chance 😅 any other configs to play around with to fall back to the old engine that come to mind?

amano-takahisa commented 3 months ago

@anastassiavybornova I had the same error like AttributeError: 'Series' object has no attribute 'area' in my vienv.

In my case, I was using pip/uv to build the environment, not conda. I found that in my case the error occurs when dask[dataframe] is included. I stopped using dask[complete] and restricted it to the necessary subset, such as dask[array], and it worked without error. I hope this helps.

anastassiavybornova commented 3 months ago

Will try it out next time!

amano-takahisa commented 3 months ago

Also I could use with dask[dataframe] if I install dask-geodataframe from main of this repository. I added following to dependencies to my projects' pyproject.toml until new version of this library is deployed to pypi.

"dask-geopandas @ git+https://github.com/geopandas/dask-geopandas@main"

I'm look forward to the new release ;)