geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
257 stars 21 forks source link

BUG: read_arrow does cannot read from bytes / buffer / file-like #401

Closed brendan-ward closed 1 month ago

brendan-ward commented 2 months ago

When reading from a file-like, we first extract out the bytes and set them into a VSI memory file, and then clean that up after reading the datasource. When reading from Arrow using read_arrow (ultimately affects open_arrow), we inadvertently cleanup the memory file before actually reading from it because we do the following in open_arrow:

path, buffer = get_vsi_path(path_or_buffer)

try:
    return ogr_open_arrow(...)

finally:
    if buffer:
        remove_virtual_file(path)

The finally block gets called before ogr_open_arrow has read from the memory file.

The cleanest fix is likely to move all handling of the memory file to ogr_open_arrow, and let it cleanup the memory file in its finally block.

jorisvandenbossche commented 2 months ago

Yeah, ogr_open_arrow itself uses yield to return the stream, so that can be consumed before the finally block inside ogr_open_arrow is run. So we would need to do the same here, but not sure you can do yield twice in a chain. But indeed, could also move the virtual file cleanup into ogr_open_arrow, and all yield / cleanup logic is handled at that level.