geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
272 stars 22 forks source link

Add support for automatic conversion of surface geometries to linear ones when using use_arrow #307

Open theroggy opened 1 year ago

theroggy commented 1 year ago

As noted in #297, automatic conversion of surface geometries to linear ones doesn't seem to work (is some cases?), as tested by test_read_multisurface.

For reference, support for this for the automatic conversion use_arrow=False path was added in #140.

The following steps seem appropriate actions to get this solved in short term:

  1. It seems that #140 is a workaround to avoid hitting a bug in gdal where gdal doesn't do the conversion to linear geometry types even though pyogrio always sets OGRSetNonLinearGeometriesEnabledFlag(0) before reading. If this is the case, a bug report for gdal seems the way to go.
  2. Add a basic test on this automatic conversion to check if the arrow implementation already takes in account OGRSetNonLinearGeometriesEnabledFlag. If it does, solving point 1. will fix the support.

On longer term, or in parallel, the following remark in the doc of OGRSetNonLinearGeometriesEnabledFlag is "interesting": "Libraries should generally not use that method, since that could interfere with other libraries or applications." So possibly, to be checked with gdal developers, an alternative way to be able to request the conversion via the arrow interface could be considered.

brendan-ward commented 1 year ago

Crossref #166

Libraries should generally not use that method, since that could interfere with other libraries or applications.

But in our case we need geometry WKB values that can be directly ingested into Shapely / GEOS, which doesn't support nonlinear geometry types. Granted, we could return the original geometries as-is from read, but set a flag when reading via read_dataframe to always get those as linear types so that we don't break when we construct Shapely geometries from them. As per #166, we could do more of this conversion internally in order to support the surface types as basically special cases of Multipolygon / Polygon types.

In short, I think it is OK we are going against the recommendation of GDAL here; we've made a deliberate choice based on the libraries we're using to represent geometries.

rouault commented 12 months ago

using OGR_G_ForceTo() with OGR_GT_GetLinear() is an alternative to OGRSetNonLinearGeometriesEnabledFlag()

theroggy commented 12 months ago

using OGR_G_ForceTo() with OGR_GT_GetLinear() is an alternative to OGRSetNonLinearGeometriesEnabledFlag()

@rouault what is the approach to use when the arrow API is used to read the data?

rouault commented 12 months ago

what is the approach to use when the arrow API is used to read the data?

Currently, it would be up to the user to import the WKB a a OGRGeometry, use OGR_G_ForceTo() and export to WKB before doing something else with it.