apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.31k stars 3.48k forks source link

[Python][S3] Segmentation fault when running multithreading in Docker #39703

Open ovcharenko opened 8 months ago

ovcharenko commented 8 months ago

Describe the bug, including details regarding any error messages, version, and platform.

When running such a simple test in Docker I'm getting Segmentation fault with PyArrow v14.0.2. The same test works fine in a versions bellow.

Sample test:

import concurrent.futures
import pyarrow.parquet as pq
from time import time

feeds = [
    "file1",
    "file2",
    ...
    "fileN",
]

def try_feed_load(feed: str):
    start = time()
    pq.read_table(f"s3://bucket_name/{feed}", filters=None)
    end = time()
    print(f"{feed} took {end - start} seconds")

with concurrent.futures.ThreadPoolExecutor(max_workers=6) as pool_executor:
    for collected_data in pool_executor.map(try_feed_load, feeds):
        pass

To repeat:

  1. Start a container: docker run --rm -it --init --ulimit core=-1 --mount type=bind,source=/tmp/,target=/tmp/ -v /root/work:/work --entrypoint /bin/bash amd64/python:3.11-slim
  2. Install most recent PyArrow package:
    cd /work
    python3 -m venv .venv_recent
    . .venv_recent/bin/activate
    pip install pyarrow # That will install pyarrow-14.0.2-cp311-cp311-manylinux_2_28_x86_64.whl.metadata
  3. Run a test. It fails immediately:
    python test2.py
    Segmentation fault (core dumped)

Workarounds:

  1. Change the script to run at most 1 worker
  2. Downgrade to PyArrow v14.0.1

Failure details:

 Thread 67 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8bf97956c0 (LWP 943)]
0x00007f8ca273cd50 in Aws::Utils::Threading::ReaderWriterLock::LockReader() () from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400

(gdb) bt
#0  0x00007f8ca273cd50 in Aws::Utils::Threading::ReaderWriterLock::LockReader() () from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#1  0x00007f8ca275ee62 in Aws::Config::ConfigAndCredentialsCacheManager::GetConfigProfile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const
    () from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#2  0x00007f8ca275f103 in Aws::Config::GetCachedConfigProfile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#3  0x00007f8ca2761736 in Aws::Auth::STSAssumeRoleWebIdentityCredentialsProvider::STSAssumeRoleWebIdentityCredentialsProvider() ()
   from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#4  0x00007f8ca2763866 in Aws::Auth::DefaultAWSCredentialsProviderChain::DefaultAWSCredentialsProviderChain() () from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#5  0x00007f8ca1fbf579 in arrow::fs::S3Options::ConfigureDefaultCredentials() () from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#6  0x00007f8ca1fceac0 in arrow::fs::S3Options::FromUri(arrow::internal::Uri const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()
   from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#7  0x00007f8ca12fa070 in arrow::fs::(anonymous namespace)::FileSystemFromUriReal(arrow::internal::Uri const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, arrow::io::IOContext const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()
   from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#8  0x00007f8ca12faa80 in arrow::fs::FileSystemFromUri(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, arrow::io::IOContext const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) () from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#9  0x00007f8ca12fab7c in arrow::fs::FileSystemFromUriOrPath(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, arrow::io::IOContext const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) () from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#10 0x00007f8ca12facf3 in arrow::fs::FileSystemFromUriOrPath(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) () from /usr/local/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#11 0x00007f8bfa118712 in __pyx_pf_7pyarrow_3_fs_10FileSystem_2from_uri(_object*) () from /usr/local/lib/python3.11/site-packages/pyarrow/_fs.cpython-311-x86_64-linux-gnu.so
#12 0x00007f8ca50d5843 in PyObject_Vectorcall () from /usr/local/bin/../lib/libpython3.11.so.1.0
#13 0x00007f8ca50ca92b in _PyEval_EvalFrameDefault () from /usr/local/bin/../lib/libpython3.11.so.1.0
#14 0x00007f8ca50c94ba in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#15 0x00007f8ca50c4a5d in _PyObject_FastCallDictTstate () from /usr/local/bin/../lib/libpython3.11.so.1.0
#16 0x00007f8ca50e6244 in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#17 0x00007f8ca50c3721 in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#18 0x00007f8ca50c35cf in _PyObject_MakeTpCall () from /usr/local/bin/../lib/libpython3.11.so.1.0
#19 0x00007f8ca50ca92b in _PyEval_EvalFrameDefault () from /usr/local/bin/../lib/libpython3.11.so.1.0
#20 0x00007f8ca50c94ba in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#21 0x00007f8ca50ccab1 in _PyEval_EvalFrameDefault () from /usr/local/bin/../lib/libpython3.11.so.1.0
#22 0x00007f8ca50c94ba in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#23 0x00007f8ca50ccab1 in _PyEval_EvalFrameDefault () from /usr/local/bin/../lib/libpython3.11.so.1.0
#24 0x00007f8ca50c94ba in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#25 0x00007f8ca50f2e6d in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#26 0x00007f8ca516c3c4 in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#27 0x00007f8ca516c354 in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#28 0x00007f8ca4d72044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#29 0x00007f8ca4df1880 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
(gdb)

Component(s)

Python

samjsharpe commented 8 months ago

This may be related to https://github.com/apache/arrow/issues/38364 as I believe what this is going to do is change it so that instead of s3 being initialised up front, it's going to be initialised in each thread and I can believe that might be a race condition or something.

ovcharenko commented 8 months ago

Found another workaround: call ensure_s3_initialized() before thread loop.

messense commented 6 months ago

I've also encountered this issue with pyarrow 15.0.0

Thread 27 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb93e5640 (LWP 249826)]
0x00007fffca1ecef0 in Aws::Utils::Threading::ReaderWriterLock::LockReader() () from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
(gdb) bt
#0  0x00007fffca1ecef0 in Aws::Utils::Threading::ReaderWriterLock::LockReader() () from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#1  0x00007fffca20f002 in Aws::Config::ConfigAndCredentialsCacheManager::GetConfigProfile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#2  0x00007fffca20f2a3 in Aws::Config::GetCachedConfigProfile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#3  0x00007fffca2118d6 in Aws::Auth::STSAssumeRoleWebIdentityCredentialsProvider::STSAssumeRoleWebIdentityCredentialsProvider() () from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#4  0x00007fffca213a06 in Aws::Auth::DefaultAWSCredentialsProviderChain::DefaultAWSCredentialsProviderChain() () from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#5  0x00007fffc9a6f559 in arrow::fs::S3Options::ConfigureDefaultCredentials() () from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#6  0x00007fffc9a7ecc0 in arrow::fs::S3Options::FromUri(arrow::internal::Uri const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) () from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#7  0x00007fffc8da7f73 in arrow::fs::(anonymous namespace)::FileSystemFromUriReal(arrow::internal::Uri const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, arrow::io::IOContext const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) () from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#8  0x00007fffc8da8a40 in arrow::fs::FileSystemFromUri(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, arrow::io::IOContext const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()
   from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#9  0x00007fffc8da8df1 in arrow::fs::FileSystemFromUriOrPath(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, arrow::io::IOContext const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()
   from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
#10 0x00007fffc8da8e53 in arrow::fs::FileSystemFromUriOrPath(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()
   from /home/ubuntu/workspace/example/.venv/lib/python3.10/site-packages/pyarrow/libarrow.so.1500
$ pip freeze | grep pyarrow
pyarrow==15.0.0
pitrou commented 3 days ago

Has any of you tried with latest PyArrow (17.0.0)?

ovcharenko commented 2 days ago

Has any of you tried with latest PyArrow (17.0.0)?

I may confirm that the problem is fixed in v17.0.0