Closed brendan-ward closed 2 days ago
After splicing in changes from #349 to correctly release the Arrow stream, I'm no longer able to easily reproduce the segfaults on repeated reruns of CI. Given that these were intermittent, it's hard to say for certain, but per the GDAL docs, we should have been releasing the stream:
On successful return, and when the stream interfaces is no longer needed, it must must be freed...
Clarification, it seems that the stream.release
is not being called (is NULL
) - presumably because the stream was used in the pyarrow reader? But free(stream)
is getting called. Not sure why either of these would avoid an unpredictable segfault, it seems more likely that initializing the .release
method to NULL
on the ArrowArrayStream
prevents calling a random pointer (per the comment), even though the .release
method was not defined prior to these changes (maybe something expecting that interface was expecting a .release
anyway).
I merged https://github.com/geopandas/pyogrio/pull/349, so let's see if the failures now disappear on main and other PRs.
Although to be honest I also don't fully understand how the changes would fix the crashes ..
Clarification, it seems that the
stream.release
is not being called (isNULL
) - presumably because the stream was used in the pyarrow reader?
Indeed, that's the idea. When we pass the struct to pyarrow, they will "move" it and take ownership of it (i.e. being responsible of releasing it when done reading the data) by setting the release callback to NULL, to ensure we don't call that anymore. (https://arrow.apache.org/docs/dev/format/CDataInterface.html#moving-an-array)
Hmm, still a crash on the merge commit on main .. https://github.com/geopandas/pyogrio/actions/runs/8718790747/job/23916758253
Split up dataframe read with
where
test to try and probe at #384Also tries to remove
IF
blocks marked as deprecated by Cython, except for those still needed for GDAL version specific functions.