apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.59k stars 3.54k forks source link

[Python][C++] MemoryPool is destructed before deallocating its buffers leads to segfault #16014

Open asfimport opened 5 years ago

asfimport commented 5 years ago

Consider the following test function:

def test_memory_pool():
    import pyarrow as pa
    pool = pa.logging_memory_pool(pa.default_memory_pool())
    buf = pa.allocate_buffer(10, memory_pool=pool)

that will fail with segfault when pool is garbage collected before buf. However, the following test function succeeds:

def test_memory_pool():
    import pyarrow as pa
    pool = pa.logging_memory_pool(pa.default_memory_pool())
    buf = pa.allocate_buffer(10, memory_pool=pool)
    del buf

because all buffers are freed before pool destruction.

To fix this issue, the pool instance should be attached to buffer instances that the pool is creating. This will ensure that pool will be alive until all its buffers are destroyed.

Reporter: Pearu Peterson / @pearu

PRs and other links:

Note: This issue was originally created as ARROW-4825. Please see the migration documentation for further details.

asfimport commented 5 years ago

Pearu Peterson / @pearu: So, as pointer out by @wesm, the pool is already attached to the Buffer instances but via MemoryPool pointer.

Confirming that the change from MemoryPool\* to std::shared_ptr<MemoryPool>& in relevant functions fixes the issue.

Although, this change needs to be applied to all of Arrow C++ source codes that use MemoryPool (there are alot), it seems a right thing to do.