geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
257 stars 21 forks source link

ENH: refactor handling of reading from in-memory dataset #407

Closed brendan-ward closed 1 month ago

brendan-ward commented 2 months ago

Resolves #401

This moves all handling of creation / destruction of the in-memory dataset used for read operations to the Cython tier, and now passes in either a string or bytes to read_info(), list_layers(), read_bounds(), read(), read_arrow() functions. I think this also sets us to for later refactors where we may pass file-like / filesystem-like objects directly down to Cython in order to use the GDAL virtual filesystem plugin.

To keep things organized, I split VSI related Cython functions into the new _vsi.pyx / _vsi.pxd files.

Because the bytes buffer is passed as a parameter, it remains in scope during the Cython function and we don't need to hold an extra handle on it like before. (the handle was required to prevent Python from deallocating it while GDAL still may use it to represent the in-memory dataset).

This expands the tests to verify that core functions still work correctly when passed bytes or file-like objects. To avoid the various GPKG related errors when working with this type in memory, I instead added new test fixtures to create a GeoJSON file from the first 3 records of the naturalearth_lowres dataset.

This also adds a specific check that the incoming path does not already contain /vsimem/, since we have to handle that internally and cannot allow it to be passed in.