SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
625 stars 283 forks source link

Iris <-> GeoPandas interface? #6047

Open hsteptoe opened 1 month ago

hsteptoe commented 1 month ago

✨ Feature Request

Build interface for translating between Iris cubes and GeoPandas dataframes.

Motivation

GeoPandas is quickly becoming a key package for working with geospatial data in python.

We have a Iris <-> pandas interface, but should this be extended to GeoPandas?

In principal we could do Iris <-> pandas <-> GeoPandas... but we could also make this more user-friendly.

Is this within scope of what Iris should do? Thoughts?

pp-mo commented 1 month ago

Thanks @hsteptoe I do think this is interesting. We have all been watching SciPy2024 last week, and noted the prevalence of Geopandas there. AFAICT there are some solutions out there for importing geopandas into xarray, notably geocube which is noted in xarray docs. Which ought to be usable, via ncdata. But not, I think the in the reverse direction (i.e. write xarray data to geopandas) ?

trexfeathers commented 1 month ago

Just been chatting to @hsteptoe offline. He works with some software that insists on GeoPandas format, I think there are enough other geo-referenced tabular formats - mostly relying on polygon information, it seems - that it's a space worth investigating.

I'm wondering about a callable utility that would add Shapely polygon information to a given Cube as an AuxCoord or AncillaryVariable. The existing iris.pandas interface could be programmed to detect this and handle it accordingly?

It would presumably also be possible to construct a grid from a series of polygons. This would be required for the reverse interoperability, and I know there are other use cases for this (@gcsima brought me one this year).

@hsteptoe might have some spare cycles to look into this, certainly earlier than the Iris maintainers could get to it.

hsteptoe commented 1 month ago

I think my instinct is to add an iris.geopandas module, mainly to respect the pandas vs geopandas difference. A pandas.DataFrame isn't automatically recognised as a geopandas.GeoDataFrame if it has a column of geometry information, so I don't think users should expect the iris.pandas to do this.

geopandas also doesn't have a native method to take a pandas.DataFrame to a geopandas.GeoDataFrame, so we might as well write code to do iris.Cube <-> geopandas.GeoDataFrame, rather than to go via a pandas.DataFrame as an intermediate step.

My API suggestions would be something like (equivalent to iris.pandas):

>>> from iris.geopandas import as_geo_data_frame
>>> import geopandas as gpd
>>> cube = iris.load_cube(path)
>>> gdf = as_geo_data_frame(cube)
trexfeathers commented 1 month ago

I think my instinct is to add an iris.geopandas module

I could see the case for not having the existing routines 'magically' do two different things, but I'd still rather see any new routines put into iris.pandas since they are such related concepts.

hsteptoe commented 1 month ago

OK, so from iris.pandas import as_geo_data_frame would be a reasonable compromise?

trexfeathers commented 1 month ago

OK, so from iris.pandas import as_geo_data_frame would be a reasonable compromise?

Yes that's the kind of thing I meant

hsteptoe commented 1 month ago
trexfeathers commented 1 month ago
  • [ ] Need to work out why there seems to be a dependency conflict with geopandas and pyvista, vtk and geovista.

https://github.com/SciTools/iris/issues/5517#issuecomment-1771315944