geoarrow / geoarrow-python

Python implementation of the GeoArrow specification
http://geoarrow.org/geoarrow-python/
Apache License 2.0
59 stars 3 forks source link

Writing GeoParquet from GeoPandas using GeoArrow encoded columns? #24

Closed fsvenson closed 11 months ago

fsvenson commented 11 months ago

I've been trying to figure out if this package enables writing GeoParquet files with GeoArrow geometry columns but haven't yet found a way. Is this supported or will that support come with GeoParquet 1.1 when GeoArrow is supposed to be supported in that spec?

If that is the case, what other ways do we have of writing GeoArrow columns to GeoParquet? Is translating via ogr2ogr and GEOMETRY_ENCODING=GEOARROW the best option?

paleolimbot commented 11 months ago

In GeoParquet (currently), the only option is WKB. geoarrow.pyarrow.as_wkb() will happily do that conversion for you! We haven't implemented the part where we add GeoParquet metadata based on the geoarrow type information (maybe it will eventually be geoarrow.pyarrow.io.write_geoparquet()).

In the meantime, you can also just use pyarrow.parquet.write_table() (and read back in using pyarrow.parquet.read_table() after importing geoarrow via import geoarrow.pyarrow.pa), which will keep all the geoaptial types and metadata. I might avoid ogr2ogr with GEOMETRY_ENCODING=GEOARROW for now because it uses a preliminary version of the specification (hopefully we will fix soon!).

jorisvandenbossche commented 11 months ago

For the question in the title, i.e. how to do this from GeoPandas, you can right now indeed use pyarrow.parquet.write_table, but for that first need to convert the GeoDataFrame to a pyarrow.Table using a geoarrow extension type. That's a conversion that we still have to implement properly (using shapely to get the coordinates), but probably the easiest workaround for now is using geopandas to convert to WKB (GeoDataFrame.to_wkb()), and then convert to pyarrow, and then use geoarrow.pyarrow to convert the WKB column to geoarrow.

Eventually we also certainly want to add this directly to geopandas.GeoDataFrame.to_parquet

paleolimbot commented 11 months ago

the easiest workaround for now is using geopandas to convert to WKB

This workaround is implemented as geoarrow.pyarrow.to_geopandas() at the moment (which also takes care of propagating the CRS).

fsvenson commented 11 months ago

Thanks for quick and thorough feedback, that should put me on the right track! Looking forward to the future updates to this ecosystem!