geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
272 stars 22 forks source link

TST: expand testing with Arrow reader #297

Closed jorisvandenbossche closed 1 year ago

jorisvandenbossche commented 1 year ago

Starting to parametrize some more tests (xref https://github.com/geopandas/pyogrio/issues/278).

Discovered one issue with invalid SQL WHERE attribute filter not raising an error for GPKG (https://github.com/OSGeo/gdal/issues/8492)

jorisvandenbossche commented 1 year ago

Something else that isn't supported by the ArrowStream reader, in addition to force_2d option, is the automatic conversion of Surface geometries to linear ones, tested by test_read_multisurface (support for this was added in https://github.com/geopandas/pyogrio/pull/140). But of those apply a transformation on the ogr_geometry object before converting this to WKB.

theroggy commented 1 year ago

Something else that isn't supported by the ArrowStream reader, in addition to force_2d option

At least on the level of read_dataframe, this seems easy to solve using shapely.force_2d? -> #300 adds this support.

is the automatic conversion of Surface geometries to linear ones, tested by test_read_multisurface (support for this was added in #140). But of those apply a transformation on the ogr_geometry object before converting this to WKB.

I suppose this should lead to a bug report for gdal? If I understand correctly, #140 is a workaround to avoid hitting a bug in gdal where gdal doesn't do the conversion to linear geometry types even though pyogrio always sets OGRSetNonLinearGeometriesEnabledFlag(0) before reading?

theroggy commented 1 year ago

When enabling tests for force_2d I noticed fid filter support is still missing as well, created issue for this: #301

jorisvandenbossche commented 1 year ago

Yes, you can see the keywords for which we raise an error here:

https://github.com/geopandas/pyogrio/blob/dffb5028cdfe0e7a55afb68cda7ed7cad303a5bc/pyogrio/_io.pyx#L1252-L1267

max_features and skip_features are now handled at the python level above this (in read_arrow), and force_2d you opened a PR to similarly handle that on the python side (but in one level higher up, in read_dataframe). I am afraid that the fids is not something we can easily handle on our side.

theroggy commented 1 year ago

I am afraid that the fids is not something we can easily handle on our side.

Using a where filter it should be possible at first sight, at least if max 1000 fids are requested... or I'll give it a try at least...

jorisvandenbossche commented 1 year ago

is the automatic conversion of Surface geometries to linear ones, tested by test_read_multisurface (support for this was added in #140). But of those apply a transformation on the ogr_geometry object before converting this to WKB.

I suppose this should lead to a bug report for gdal? If I understand correctly, #140 is a workaround to avoid hitting a bug in gdal where gdal doesn't do the conversion to linear geometry types even though pyogrio always sets OGRSetNonLinearGeometriesEnabledFlag(0) before reading?

Forgot to answer to this, but yes that might be something to bring up to GDAL, if we can make a reproducer for it. At least for the normal reading, it might be considered a bug. For the ArrowStream reading, not sure, here it might more be a feature request (the docs of OGRSetNonLinearGeometriesEnabledFlag don't mention that the ArrowStream reading is impacted, and I can imagine this is not necessarily easy to implement).

Reading those docs, I also see the note "Libraries should generally not use that method, since that could interfere with other libraries or applications." ..

theroggy commented 1 year ago

I created #307 to follow up adding support of automatic conversion of surface geometries to linear ones.

jorisvandenbossche commented 1 year ago

Thanks!