Open essandess opened 1 year ago
Could you get backtrace by lldb?
Here's the macOS Crash Reporter output. What's the lldb
command to capture the trace for python3.10 -c 'import pyarrow'
?
Also, I've tried setting a few more cmake
flags to be more consistent with the build script at python_wheel_macos_build.sh, but keep hitting the same issue.
Process: Python [51251]
Path: /opt/local/Library/Frameworks/Python.framework/Versions/3.10/Resources/Python.app/Contents/MacOS/Python
Identifier: org.python.python
Version: 3.10.12 (3.10.12)
Code Type: X86-64 (Native)
Parent Process: bash [15799]
Responsible: Terminal [15797]
User ID: 502
Date/Time: 2023-08-03 16:08:12.4564 -0400
OS Version: macOS 13.5 (22G74)
Report Version: 12
Bridge OS Version: 7.6 (20P6072)
Time Awake Since Boot: 810000 seconds
Time Since Wake: 607027 seconds
System Integrity Protection: enabled
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000012340
Exception Codes: 0x0000000000000001, 0x0000000000012340
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [51251]
VM Region Info: 0x12340 is not in any region. Bytes before following region: 4352785600
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
__TEXT 103736000-10373a000 [ 16K] r-x/r-x SM=COW .../MacOS/Python
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libjemalloc.2.dylib 0x1059ec3df je_free_default + 193
1 libprotobuf.3.21.12.0.dylib 0x105688477 google::protobuf::internal::ArenaStringPtr::Destroy() + 39
2 libprotobuf.3.21.12.0.dylib 0x10571d5a5 google::protobuf::FileDescriptorProto::SharedDtor() + 149
3 libprotobuf.3.21.12.0.dylib 0x10571d6bd google::protobuf::FileDescriptorProto::~FileDescriptorProto() + 45
4 libprotobuf.3.21.12.0.dylib 0x105738657 google::protobuf::EncodedDescriptorDatabase::Add(void const*, int) + 167
5 libprotobuf.3.21.12.0.dylib 0x1056e34f4 google::protobuf::DescriptorPool::InternalAddGeneratedFile(void const*, int) + 36
6 libprotobuf.3.21.12.0.dylib 0x1057544d9 google::protobuf::(anonymous namespace)::AddDescriptors(google::protobuf::internal::DescriptorTable const*) + 105
7 dyld 0x7ff81b4f93fb invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const + 175
8 dyld 0x7ff81b537b7a invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 242
9 dyld 0x7ff81b52bf22 invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 577
10 dyld 0x7ff81b4dc0af dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 245
11 dyld 0x7ff81b52b0bf dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 175
12 dyld 0x7ff81b53773a dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 470
13 dyld 0x7ff81b4f666c dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 220
14 dyld 0x7ff81b4f685a dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 178
…
Here's the lldb
trace:
Thanks. It seems that jemalloc is related.
Could you try disabling jemalloc when you build Apache Arrow C++?
You can disable jemalloc by specifying the -DARROW_JEMALLOC=OFF
CMake option.
Thank you, that fixes this issue. I'll bump this upstream to jemalloc
.
In general, jemalloc is faster than system memory allocator. So you may want to use jemalloc instead of disabling jemalloc.
Apache Arrow C++ uses bundled jemalloc instead of system jemalloc by default. It means that Apache Arrow C++ and Protobuf use different jemalloc in the same process. It may cause this problem. This problem may be solved by using system jemalloc by specifying -Djemalloc_SOURCE=SYSTEM
instead of -DARROW_JEMALLOC=OFF
.
Thanks. MacPorts’s apache-arrow
already uses these system build flags, so this looks like an issue with jemalloc
on macOS.
Several people have pointed out that jemalloc
may not be the cause of this issue, so I am reopening the issue here. See:
Several people have pointed out that
jemalloc
may not be the cause of this issue, so I am reopening the issue here.
Well, it depends. Does this issue still happen if you disable Arrow's jemalloc integration with -DARROW_JEMALLOC=OFF
?
Whether this is the same issue or a different one, I am seeing a crash on 'import pyarrow'. But it is a BAD INSTRUCTION crash. Seeing this on both Apple Silicon M1, and Apple Silicon M1 Max. Big Sur and Ventura using the public wheels downloaded via pip. 12.0.1 is the last version that works, 13.0.0, 14.0.0, 14.0.1 and 14.0.2 all crash with the BAD INSTRUCTION. This is not being seen on a M2 Sonoma machine... Are the wheels being built on a M2 machine picking up a higher optimization or processor instruction flag?
@prniii Can you get a low-level traceback like in https://github.com/apache/arrow/issues/37010#issuecomment-1664653511 ?
Also this is running under Python.org 's 3.11.5 for macOS.
Is this sufficient?
This is (lldb) bt
from within Xcode at the point my app crashed in the call to Py_RunString(). it submitted "import pyarrow"
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x4a03000)
* frame #0: 0x00000002cb9ae308 libarrow.1400.dylib`_armv7_neon_probe + 72
frame #1: 0x00000002cb9aeaa4 libarrow.1400.dylib`OPENSSL_cpuid_setup + 924
frame #2: 0x0000000196a381d8 dyld`invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const + 168
frame #3: 0x0000000196a79c60 dyld`invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 172
frame #4: 0x0000000196a6d1a4 dyld`invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 528
frame #5: 0x0000000196a182d8 dyld`dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 296
frame #6: 0x0000000196a6c1cc dyld`dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 192
frame #7: 0x0000000196a6ecfc dyld`dyld3::MachOFile::forEachInitializerPointerSection(Diagnostics&, void (unsigned int, unsigned int, bool&) block_pointer) const + 160
frame #8: 0x0000000196a79904 dyld`dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 432
frame #9: 0x0000000196a3485c dyld`dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 448
frame #10: 0x0000000196a34c10 dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 220
frame #11: 0x0000000196a34bec dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 184
frame #12: 0x0000000196a34bec dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 184
frame #13: 0x0000000196a34bec dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 184
frame #14: 0x0000000196a34bec dyld`dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 184
frame #15: 0x0000000196a38264 dyld`dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const::$_1::operator()() const + 112
frame #16: 0x0000000196a34d90 dyld`dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 304
frame #17: 0x0000000196a52d58 dyld`dyld4::APIs::dlopen_from(char const*, int, void*) + 1440
frame #18: 0x000000012b2004f0 Python`_imp_create_dynamic + 1304
frame #19: 0x000000012b0fe2d0 Python`cfunction_vectorcall_FASTCALL + 80
frame #20: 0x000000012b1bf3c8 Python`_PyEval_EvalFrameDefault + 63800
frame #21: 0x000000012b1c2264 Python`_PyEval_Vector + 156
frame #22: 0x000000012b09c15c Python`object_vacall + 292
frame #23: 0x000000012b09bf8c Python`PyObject_CallMethodObjArgs + 92
frame #24: 0x000000012b1fad10 Python`PyImport_ImportModuleLevelObject + 996
frame #25: 0x000000012b1a79bc Python`builtin___import__ + 168
frame #26: 0x000000012b0fe430 Python`cfunction_vectorcall_FASTCALL_KEYWORDS + 80
frame #27: 0x00000002a3031f68 libshiboken6.abi3.6.6.dylib`feature_import(_object*, _object*, _object*) + 168
frame #28: 0x000000012b0fda20 Python`cfunction_call + 60
frame #29: 0x000000012b099738 Python`_PyObject_MakeTpCall + 128
frame #30: 0x000000012b1b8850 Python`_PyEval_EvalFrameDefault + 36288
frame #31: 0x000000012b1ae814 Python`PyEval_EvalCode + 276
frame #32: 0x000000012b1a90a4 Python`builtin_exec + 428
frame #33: 0x000000012b0fe430 Python`cfunction_vectorcall_FASTCALL_KEYWORDS + 80
frame #34: 0x000000012b1bf3c8 Python`_PyEval_EvalFrameDefault + 63800
frame #35: 0x000000012b1c2264 Python`_PyEval_Vector + 156
frame #36: 0x000000012b09c15c Python`object_vacall + 292
frame #37: 0x000000012b09bf8c Python`PyObject_CallMethodObjArgs + 92
frame #38: 0x000000012b1fad10 Python`PyImport_ImportModuleLevelObject + 996
frame #39: 0x000000012b1a79bc Python`builtin___import__ + 168
frame #40: 0x000000012b0fe430 Python`cfunction_vectorcall_FASTCALL_KEYWORDS + 80
frame #41: 0x00000002a3031f68 libshiboken6.abi3.6.6.dylib`feature_import(_object*, _object*, _object*) + 168
frame #42: 0x000000012b0fda20 Python`cfunction_call + 60
frame #43: 0x000000012b099738 Python`_PyObject_MakeTpCall + 128
frame #44: 0x000000012b1b8850 Python`_PyEval_EvalFrameDefault + 36288
frame #45: 0x000000012b1ae814 Python`PyEval_EvalCode + 276
frame #46: 0x000000012b22bcbc Python`PyRun_StringFlags + 212
I see that this crash dump may be related to openSSL because I am debugging. Looking at a crash dump file from a Release build, which is an address violation. Will post after I examine the release crash file
@prniii Oops, it seems like running under lldb may produce the wrong traceback according to https://bugzilla.redhat.com/show_bug.cgi?id=1006474
There is something else that you could try:
ulimit -c unlimited
)python -c "import pyarrow"
(not under lldb)lldb python -c /path/to/core/file
)Also cc @assignUser
Is this issue being investigated?
@antheus-s If you can reproduce the issue, perhaps you could help us by providing a stack trace of the crash? See https://github.com/apache/arrow/issues/37010#issuecomment-1889644937 for possible instructions.
Sadly, I am not knowledgable enough in this library to help, I was just curious if anything was still being investigated, as the last reply was from January. I am using pythonnet, a library built upon Apache Arrow to run C# code in Python and vise versa. It currently causes a runtime exception when I try to import any python library.
If it happens when you try to import any Python library, then you should report the issue to pythonnet.
@pitrou, which I did months ago. However, the issue is caused by Apache Arrow, so I am wondering if anyone is working on a fix.
How do you know that the issue is caused by Apache Arrow? Can you point us to an analysis of the issue?
In my conversations with pythonnet contributors, it was explained to me as an issue that is currently fixed in this repo in this commit with a temporary workaround: https://github.com/apache/arrow/commit/7f0c4070dd723b2f7e1967d7f7f2cccf6fb256b7.
I am looking for progress, as I am currently forced to use a different setup for certain development tasks, but like I said, I am not familiar enough with this repository to dive into details. I'm just asking whether there is any ongoing investigation.
Thanks for the pointer @antheus-s ! This does not seem related to Arrow since it occurs when importing any Python module IIUC. Also, now that the pythonnet project has a detailed analysis to work with, hopefully they can devise a reliable solution.
Describe the bug, including details regarding any error messages, version, and platform.
The
pyarrow
packages create a segfault on import when compiled on macOS boxes. This happens on macOSarm64
andx86_64
, using (at least) version 12.0.0 and 13.0.0.Context: I am trying to update the MacPorts
apache-arrow
ports at https://github.com/macports/macports-ports/pull/19664 and discovered this issue. MacPorts uses theapache-arrow
build instructions and approach in python.rst#build-and-test and python_wheel_macos_build.sh.Behavior:
Component(s)
Python