geopandas / dask-geopandas

Parallel GeoPandas with Dask
https://dask-geopandas.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
486 stars 45 forks source link

Can `GeoDataFrame.crs` set `None`? #296

Closed amano-takahisa closed 1 month ago

amano-takahisa commented 2 months ago

I would like to use dask-geopandas.GeoDataFrame for non-geospatial data as well. Therefore, I tried to drop CRS data by assign None to GeoDataFrame.crs as follows, which worked on geopandas.

import dask_geopandas as dask_gpd
import geopandas as gpd
from shapely import Point

d = {
    'col1': ['name1', 'name2'],
    'geometry': [Point(1, 2), Point(2, 1)],
}

gdf = gpd.GeoDataFrame(d, crs='EPSG:4326')
dask_gdf = dask_gpd.from_geopandas(gdf)
dask_gdf.crs = None

The above raised the following.

Traceback (most recent call last):
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask/dataframe/utils.py", line 195, in raise_on_meta_error
    yield
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_expr/_expr.py", line 3987, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_geopandas/expr.py", line 104, in _set_crs
    return df.set_crs(crs, allow_override=allow_override)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/geopandas/geodataframe.py", line 1325, in set_crs
    df.geometry = df.geometry.set_crs(
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/geopandas/geoseries.py", line 1080, in set_crs
    raise ValueError("Must pass either crs or epsg.")
ValueError: Must pass either crs or epsg.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_expr/_collection.py", line 3029, in __setattr__
    object.__setattr__(self, key, value)
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_geopandas/expr.py", line 267, in crs
    new = self.set_crs(value, allow_override=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_geopandas/expr.py", line 273, in set_crs
    new = self.map_partitions(
          ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_expr/_collection.py", line 1090, in map_partitions
    return map_partitions(
           ^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_expr/_collection.py", line 6106, in map_partitions
    return new_collection(new_expr)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_expr/_collection.py", line 4764, in new_collection
    meta = expr._meta
           ^^^^^^^^^^
  File "/usr/lib/python3.12/functools.py", line 995, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_expr/_expr.py", line 630, in _meta
    return _get_meta_map_partitions(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_expr/_expr.py", line 4001, in _get_meta_map_partitions
    meta = _emulate(func, *a, udf=True, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_expr/_expr.py", line 3986, in _emulate
    with raise_on_meta_error(funcname(func), udf=udf):
  File "/usr/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask/dataframe/utils.py", line 216, in raise_on_meta_error
    raise ValueError(msg) from e
ValueError: Metadata inference failed in `_set_crs`.

You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.

To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.

Original error is below:
------------------------
ValueError('Must pass either crs or epsg.')

Traceback:
---------
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask/dataframe/utils.py", line 195, in raise_on_meta_error
    yield
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_expr/_expr.py", line 3987, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/dask_geopandas/expr.py", line 104, in _set_crs
    return df.set_crs(crs, allow_override=allow_override)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/geopandas/geodataframe.py", line 1325, in set_crs
    df.geometry = df.geometry.set_crs(
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/work/.venv/lib/python3.12/site-packages/geopandas/geoseries.py", line 1080, in set_crs
    raise ValueError("Must pass either crs or epsg.")

Is there a way to delete a CRS already set up?

My environment was as follows.

$ pip list
Package          Version
---------------- -----------------------
attrs            23.2.0
certifi          2024.6.2
click            8.1.7
click-plugins    1.1.1
cligj            0.7.2
cloudpickle      3.0.0
dask             2024.5.2
dask-expr        1.1.2
dask-geopandas   0+untagged.162.gaa1b52f
distributed      2024.5.2
fiona            1.9.6
fsspec           2024.6.0
geopandas        0.14.4
Jinja2           3.1.4
locket           1.0.0
MarkupSafe       2.1.5
msgpack          1.0.8
numpy            1.26.4
packaging        24.1
pandas           2.2.2
partd            1.4.2
pip              24.0
psutil           5.9.8
pyarrow          16.1.0
pyproj           3.6.1
python-dateutil  2.9.0.post0
pytz             2024.1
PyYAML           6.0.1
shapely          2.0.4
six              1.16.0
sortedcontainers 2.4.0
tblib            3.0.0
toolz            0.12.1
tornado          6.4.1
tzdata           2024.1
urllib3          2.2.1
zict             3.0.0

$ python -V
Python 3.12.3

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
toihr commented 1 month ago

Bump, I am having the same problem. The Issue i have does not need to be resolved using CRS as it is a purely planar calculation within an image where the projection doesnt matter.

martinfleis commented 1 month ago

I believe that the snippet above no longer raises an error with the latest versions of dask-geopandas and geopandas. Though I would generally suggest using set_crs instead.

gdf = gpd.GeoDataFrame(d, crs='EPSG:4326')
dask_gdf = dask_gpd.from_geopandas(gdf).set_crs(None, allow_override=True)

@toihr are you using the latest geopandas and dask-geopandas?

toihr commented 1 month ago

I am on version 0.4.1 on dask-geopandas and for some reason I am on geopandas 0.14.3 that seems odd. Thanks for pointing that out

martinfleis commented 1 month ago

Yeah, I believe that you will need the changes we made in set_crs in geopandas 1.0 to make this work.

toihr commented 1 month ago

Yeah I think that might help thank you very much i think you can close this issue then.