geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
272 stars 22 forks source link

Empty geometry name in read_arrow metadata #204

Open kylebarron opened 1 year ago

kylebarron commented 1 year ago

Testing out 0.5.0

from pyogrio.raw import read_arrow
metadata, table = read_arrow('nationalpark.geojson')
metadata
# {'crs': 'EPSG:4326',
#  'encoding': 'UTF-8',
#  'fields': array(['OBJECTID', 'UNIT_CODE', 'GIS_Notes', 'UNIT_NAME', 'DATE_EDIT',
#         'STATE', 'REGION', 'GNIS_ID', 'UNIT_TYPE', 'CREATED_BY',
#         'METADATA', 'PARKNAME', 'GlobalID', 'Shape__Are', 'Shape__Len',
#         'index_right'], dtype=object),
#  'geometry_type': 'Unknown',
#  'geometry_name': ''}

It looks like the column is always named wkb_geometry?

nationalpark.geojson.zip

jorisvandenbossche commented 1 year ago

Yeah, that's something I've also run into when implementing this (see the second last bullet point in the top post of https://github.com/geopandas/pyogrio/pull/155). When converting the Arrow result (like above) to a geopandas.GeoDataFrame, we also assume it is "wkb_geometry" if not in the metadata:

https://github.com/geopandas/pyogrio/blob/d8ea90315d7a7922bb07b249804aefbc8409f39c/pyogrio/geopandas.py#L163

But so it is not always "wkb_geometry", it can be a custom name, and then it will be listed in the metadata. I am not sure why OGR_L_GetGeometryColumn does not always return the name (I intended to ask about this upstream in GDAL, but didn't yet get to that).

But maybe we should fill in the metadata with "wkb_geometry" when it is now an empty string.

kylebarron commented 1 year ago

Thanks! I should read the docs closer 😉

If it will always be "wkb_geometry" if not otherwise stated, I think it would be helpful for clarity to include that in the geometry_name field in the metadata, rather than leaving it blank.

jorisvandenbossche commented 7 months ago

I think including the "wkb_geometry" instead of empty string is still a good idea?

kylebarron commented 7 months ago

For my own purposes, I'll plan to rely on the GeoArrow metadata, but I'm +1 on adding the string to the meta

jorisvandenbossche commented 7 months ago

Ah, yes, good point. But still useful to do anyway for other use cases, so going to leave it open for now