geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
272 stars 22 forks source link

`list_layers` inconsistent with fiona and ogrinfo #275

Closed aw-west-defra closed 1 year ago

aw-west-defra commented 1 year ago

I an getting multiple layers in my dataset when there is only one.

pyogrio

>>> pyogrio.list_layers(filepath)
array([['lnd_fts_land', 'Polygon']], dtype=object)

fiona

>>> fiona.listlayers(filepath)
['lnd_fts_land']

ogrinfo

>>> ogrinfo $filepath
INFO: Open of `/path/to/file.gpkg'
      using driver `GPKG' successful.
1: lnd_fts_land (Polygon)
martinfleis commented 1 year ago

You're not. You're getting one layer (lnd_fts_land) of a 'Polygon' geometry type. pyogrio returns a 2d array with name, geom_type information per layer.

jorisvandenbossche commented 1 year ago

It might be clearer if we returned a 1D array of tuples ..

aw-west-defra commented 1 year ago

Ahh, this make much more sense when I use a dataset with more layers.

Although because it doesn't return a list would this edit be more appropriate, and consistent with user expectations.
(The edit renames to get_layers, and list_layers returns a list of just the names.)

brendan-ward commented 1 year ago

The intent of list_layers was to provide output relatively similar to what you get from the ogrinfo command at the data source level, which lists both the layer name and layer type because the layer type provides useful information:

❯ ogrinfo -so pyogrio/tests/fixtures/naturalearth_lowres/naturalearth_lowres.shp
INFO: Open of `pyogrio/tests/fixtures/naturalearth_lowres/naturalearth_lowres.shp'
      using driver `ESRI Shapefile' successful.
1: naturalearth_lowres (Polygon)

❯ ogrinfo -so pyogrio/tests/fixtures/sample.osm.pbf
INFO: Open of `pyogrio/tests/fixtures/sample.osm.pbf'
      using driver `OSM' successful.
1: points (Point)
2: lines (Line String)
3: multilinestrings (Multi Line String)
4: multipolygons (Multi Polygon)
5: other_relations (Geometry Collection)

I think the docstring and examples for list_layers are reasonably clear as to return values. Fiona and Pyogrio both use GDAL under the hood and we've aspired where possible / appropriate to keep things compatible to make it easier to switch between them in GeoPandas, but we do not make any promises that similarly-named functions are 100% the same as Fiona. In some cases, the compatibility is driven from the GeoPandas side, either by GeoPandas providing an API that abstracts away the difference between the two, or by trying to get Pyogrio or Fiona to provide something that will make that easier (e.g., https://github.com/geopandas/geopandas/issues/2837).

I'm not clear on what is broken here though? Is it that returning an an array of shape n,2 is less clear than necessary, or is it that it would be nice to have a convenience function that only returns the layer names and nothing else? If the latter, maybe we could add list_layer_names, and we mention it is equivalent to Fiona's listlayers?