geopandas / dask-geopandas

Parallel GeoPandas with Dask
https://dask-geopandas.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
503 stars 45 forks source link

ENH: add crs keyword to from_dask_dataframe #189

Open jorisvandenbossche opened 2 years ago

jorisvandenbossche commented 2 years ago

When creating a GeoDataFrame from a dask dataframe, we could pass through the crs keyword to the underlying geopandas.GeoDataFrame constructor:

https://github.com/geopandas/dask-geopandas/blob/5b49377352658e15c34cb38a43b18fcc8f833ccb/dask_geopandas/core.py#L770-L783

That avoids that you need to do a set_crs in an additional step to get a dataframe with a CRS set, as you now need to do:

gddf = dask_geopandas.from_dask_dataframe(
        ddf, geometry=dask_geopandas.points_from_xy(ddf, "lon", "lat"),
    )
gddf = gddf.set_crs(4326)
martinfleis commented 2 years ago

Yeah, we could. However, you don't need an additional step as you can pass CRS to dask_geopandas.points_from_xy.

gddf = dask_geopandas.from_dask_dataframe(
        ddf, geometry=dask_geopandas.points_from_xy(ddf, "lon", "lat", crs=4326),
    )
FlorisCalkoen commented 2 years ago

Great idea! I was actually just constructing dask_geopandas.from_dask_dataframe() and suprised that the crs keyword was not accepted. My next try was to use dask_geopandas.GeoDataFrame().set_geometry(col="geometry", crs="epsg:4326"), but this method also doesn't accept crs. Glad that gddf.set_crs(4326) is working, but adding the crs keyword to from_dask_dataframe and set_geometry would make sense to me :)