AprilRobotics / apriltag

AprilTag is a visual fiducial system popular for robotics research.
https://april.eecs.umich.edu/software/apriltag
Other
1.55k stars 532 forks source link

segmentation fault python macos #352

Open iBims1JFK opened 5 days ago

iBims1JFK commented 5 days ago

Hello, I am trying to build the Python bindings for macOS. The building process works but when trying to import the library, there is always a segmentation fault. Using Python 3.12.6 with a conda environment. I was able to build the duckietown bindings but I receive a lot of Error, more than one new minimum found. errors with them. I did not find an actual solution for this and it seems that this error does not exists with the offical bindings.

Python 3.12.6 | packaged by conda-forge | (main, Sep 11 2024, 04:55:15) [Clang 17.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import apriltag
zsh: segmentation fault  python

If I should provide more system information please let me know.

christian-rauch commented 5 days ago

Yes. I briefly looked into this for the conda package, where the tests for the Python bindings failed due to this crash. However since I do not have a macOS system, I cannot really debug this issue.

If I should provide more system information please let me know.

I guess it would help to get a backtrace. That would mean that you have to compile the project and the Python bindings with Debug symbols, then run the Python interpreter in gdb or similar, reproduce this crash, and finally print the backtrace and paste it here.

iBims1JFK commented 3 days ago

I am happy to help at the debugging process. Unfortunately I am not experienced in that field so I fear that you need to walk me a bit through this. What I did is compiling the library with the following command:

cmake -B build -DCMAKE_BUILD_TYPE=Debug \
      -DBUILD_SHARED_LIBS=ON \
      -DBUILD_PYTHON_WRAPPER=ON \
      -DPython3_EXECUTABLE=$(which python3) \
      -DPython3_INCLUDE_DIR=/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/include/python3.11 \
      -DPython3_LIBRARY=/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/lib/libpython3.11.dylib \
      -DPython3_NUMPY_INCLUDE_DIR=/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/lib/python3.11/site-packages/numpy/core/include

-- The C compiler identification is Clang 16.0.6
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/arm64-apple-darwin20.0.0-clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- The CXX compiler identification is Clang 16.0.6
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/arm64-apple-darwin20.0.0-clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (1.4s)
-- Generating done (0.0s)
CMake Warning:
  Manually-specified variables were not used by the project:

    Python3_NUMPY_INCLUDE_DIR

-- Build files have been written to: /Users/jonathan/Documents/master-thesis/apriltag-test/apriltag/build

gdb is not available for apple silicon so I used lldb, if there is a better alternative, please let me know.

lldb /opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python
(lldb) target create "/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python"
Current executable set to '/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python' (arm64).
(lldb) run test.py
Process 15532 launched: '/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python' (arm64)
Process 15532 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
    frame #0: 0x0000000101367290 libpython3.11.dylib`type_ready + 92
libpython3.11.dylib`type_ready:
->  0x101367290 <+92>:  ldr    x8, [x8, #0x10]
    0x101367294 <+96>:  ldr    w9, [x8, #0x1428]
    0x101367298 <+100>: cbz    w9, 0x101367338 ; <+260>
    0x10136729c <+104>: sub    w9, w9, #0x1
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  * frame #0: 0x0000000101367290 libpython3.11.dylib`type_ready + 92
    frame #1: 0x0000000101377de8 libpython3.11.dylib`PyType_Ready + 52
    frame #2: 0x00000001008c1864 apriltag.cpython-311-darwin.so`PyInit_apriltag + 32
    frame #3: 0x000000010019bbc0 python`_imp_create_dynamic + 1188
    frame #4: 0x00000001000b8e20 python`cfunction_vectorcall_FASTCALL + 256
    frame #5: 0x0000000100164b6c python`_PyEval_EvalFrameDefault + 55160
    frame #6: 0x0000000100166e18 python`_PyEval_Vector + 184
    frame #7: 0x000000010006286c python`object_vacall + 316
    frame #8: 0x0000000100062668 python`PyObject_CallMethodObjArgs + 108
    frame #9: 0x0000000100196c04 python`PyImport_ImportModuleLevelObject + 1580
    frame #10: 0x000000010015f0d0 python`_PyEval_EvalFrameDefault + 31964
    frame #11: 0x0000000100156424 python`PyEval_EvalCode + 220
    frame #12: 0x00000001001bc3f8 python`run_mod + 144
    frame #13: 0x00000001001bbe58 python`_PyRun_SimpleFileObject + 1260
    frame #14: 0x00000001001baf18 python`_PyRun_AnyFileObject + 240
    frame #15: 0x00000001001e192c python`Py_RunMain + 3100
    frame #16: 0x00000001001e2784 python`pymain_main + 1252
    frame #17: 0x0000000100003684 python`main + 56
    frame #18: 0x0000000182b6f154 dyld`start + 2476

I hope that this helps.

christian-rauch commented 3 days ago

Building with -DCMAKE_BUILD_TYPE=Debug is the right start. But your backtrace does not show where in apriltag.cpython-311-darwin.so it crashes. Can you print the source code lines?

If I force a crash via printf("NULL: %d\n", *(int*)NULL); (NULL-pointer dereference) in PyInit_apriltag(void) and import the module via python3 -c "import apriltag" I can reproduce a crash and backtrace that with gdb:

gdb -ex run --args python3 -c "import apriltag"

which will give:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fadb3f in PyInit_apriltag () at [...]/apriltag/apriltag_pywrap.c:380
380     printf("NULL: %d\n", *(int*)NULL);
(gdb) bt
#0  0x00007ffff7fadb3f in PyInit_apriltag () at [...]/apriltag/apriltag_pywrap.c:380
#1  0x00000000006a9881 in _PyImport_LoadDynamicModuleWithSpec (spec=0x7ffff74349e0, fp=<optimized out>) at ../Python/importdl.c:169
#2  0x00000000006a8fd2 in _imp_create_dynamic_impl (module=<optimized out>, file=0x0, spec=0x7ffff74349e0) at ../Python/import.c:3775
#3  _imp_create_dynamic (module=<optimized out>, args=<optimized out>, nargs=<optimized out>) at ../Python/clinic/import.c.h:506
#4  0x0000000000582067 in cfunction_vectorcall_FASTCALL (func=0x7ffff75972e0, args=0x7ffff75fc928, nargsf=<optimized out>, kwnames=<optimized out>)
    at ../Include/cpython/methodobject.h:50
#5  0x00000000005db336 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:3254
#6  0x0000000000549ae7 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffffffd500, callable=0x7ffff75a4040, tstate=0xba6048 <_PyRuntime+459656>)
    at ../Include/internal/pycore_call.h:92
#7  object_vacall (tstate=tstate@entry=0xba6048 <_PyRuntime+459656>, base=<optimized out>, callable=0x7ffff75a4040, vargs=0x7fffffffd588) at ../Objects/call.c:850
#8  0x000000000054b373 in PyObject_CallMethodObjArgs (obj=<optimized out>, name=<optimized out>) at ../Objects/call.c:911
#9  0x00000000005fda35 in import_find_and_load (abs_name=0x7ffff743de30, tstate=0xba6048 <_PyRuntime+459656>) at ../Python/import.c:2779
#10 PyImport_ImportModuleLevelObject (name=name@entry=0x7ffff743de30, globals=<optimized out>, locals=locals@entry=0x7ffff75f9e80, fromlist=fromlist@entry=0xa408a0 <_Py_NoneStruct>, 
    level=0) at ../Python/import.c:2862
#11 0x00000000005dc40f in import_name (level=0xb36988 <_PyRuntime+3272>, fromlist=0xa408a0 <_Py_NoneStruct>, name=0x7ffff743de30, frame=<optimized out>, tstate=<optimized out>)
    at ../Python/ceval.c:2482
#12 _PyEval_EvalFrameDefault (tstate=tstate@entry=0xba6048 <_PyRuntime+459656>, frame=<optimized out>, frame@entry=0x7ffff7fb2020, throwflag=throwflag@entry=0)
    at Python/bytecodes.c:2135
#13 0x00000000005d560b in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb2020, tstate=0xba6048 <_PyRuntime+459656>) at ../Include/internal/pycore_ceval.h:89
#14 _PyEval_Vector (kwnames=0x0, argcount=0, args=0x0, locals=0x7ffff75f9e80, func=0x7ffff744d3a0, tstate=0xba6048 <_PyRuntime+459656>) at ../Python/ceval.c:1683
#15 PyEval_EvalCode (co=co@entry=0x7ffff748cfa0, globals=globals@entry=0x7ffff75f9e80, locals=locals@entry=0x7ffff75f9e80) at ../Python/ceval.c:578
#16 0x00000000006086f3 in run_eval_code_obj (locals=0x7ffff75f9e80, globals=0x7ffff75f9e80, co=0x7ffff748cfa0, tstate=0xba6048 <_PyRuntime+459656>) at ../Python/pythonrun.c:1722
#17 run_mod (arena=0x7ffff751be50, flags=0x7ffff751be50, locals=0x7ffff75f9e80, globals=0x7ffff75f9e80, filename=<optimized out>, mod=<optimized out>) at ../Python/pythonrun.c:1743
#18 PyRun_StringFlags (str=str@entry=0x7ffff75fa050 "import apriltag\n", start=start@entry=257, globals=0x7ffff75f9e80, locals=0x7ffff75f9e80, flags=flags@entry=0x7fffffffd9c0)
    at ../Python/pythonrun.c:1618
#19 0x00000000006b40ee in PyRun_SimpleStringFlags (command=0x7ffff75fa050 "import apriltag\n", flags=flags@entry=0x7fffffffd9c0) at ../Python/pythonrun.c:480
#20 0x00000000006bce01 in pymain_run_command (command=<optimized out>) at ../Modules/main.c:255
#21 pymain_run_python (exitcode=0x7fffffffd98c) at ../Modules/main.c:620
#22 Py_RunMain () at ../Modules/main.c:709
#23 0x00000000006bc81d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at ../Modules/main.c:763
#24 0x00007ffff7c2a1ca in __libc_start_call_main (main=main@entry=0x518880 <main>, argc=argc@entry=3, argv=argv@entry=0x7fffffffdbd8) at ../sysdeps/nptl/libc_start_call_main.h:58
#25 0x00007ffff7c2a28b in __libc_start_main_impl (main=0x518880 <main>, argc=3, argv=0x7fffffffdbd8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdbc8) at ../csu/libc-start.c:360
#26 0x0000000000657ca5 in _start ()

Line 380 is then exactly the line where I added the NULL-pointer dereference.

christian-rauch commented 3 days ago

The line libpython3.11.dylib`PyType_Ready + 52 in your backtrace suggests that this is caused by PyType_Ready(&apriltagType).

Since you do not seem to be able to use a debugger for now to debug this, could you simply add a print statement like such:

    if (PyType_Ready(&apriltagType) < 0) {
        printf("PyType_Ready error!\n"); fflush(stdout);
        return NULL;
    }

and check if it prints on the screen?

christian-rauch commented 3 days ago

I added a test for python3 -c "import apriltag; apriltag.apriltag(family='tag36h11')" to the CI (https://github.com/AprilRobotics/apriltag/pull/353). This runs without crashes on a macos-14-arm64 runner: https://github.com/AprilRobotics/apriltag/actions/runs/10873853033/job/30170448778?pr=353.

This might just be related to a weird Python setup with a mixup of versions from different sources. Your initial report shows that you are using Python 3.12.6 but later you compile against libpython3.11.dylib. This very likely causes errors. I also see that you use homebrew for Python. Can you run this again with a standard Python installation outside of homebrew etc.?

iBims1JFK commented 3 days ago

I recompiled it and judging from apriltag.cpython-311-darwin.so PyInit_apriltag at apriltag_pywrap.c:375:9 [opt] the symbols work (better?) now. And I think expectedly it did not print anything.

lldb /opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python
(lldb) target create "/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python"
Current executable set to '/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python' (arm64).
(lldb) run test.py
Process 91701 launched: '/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python' (arm64)
Process 91701 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
    frame #0: 0x000000010137f290 libpython3.11.dylib`type_ready + 92
libpython3.11.dylib`type_ready:
->  0x10137f290 <+92>:  ldr    x8, [x8, #0x10]
    0x10137f294 <+96>:  ldr    w9, [x8, #0x1428]
    0x10137f298 <+100>: cbz    w9, 0x10137f338 ; <+260>
    0x10137f29c <+104>: sub    w9, w9, #0x1
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  * frame #0: 0x000000010137f290 libpython3.11.dylib`type_ready + 92
    frame #1: 0x000000010138fde8 libpython3.11.dylib`PyType_Ready + 52
    frame #2: 0x00000001008c1790 apriltag.cpython-311-darwin.so`PyInit_apriltag at apriltag_pywrap.c:375:9 [opt]
    frame #3: 0x000000010019bbc0 python`_imp_create_dynamic + 1188
    frame #4: 0x00000001000b8e20 python`cfunction_vectorcall_FASTCALL + 256
    frame #5: 0x0000000100164b6c python`_PyEval_EvalFrameDefault + 55160
    frame #6: 0x0000000100166e18 python`_PyEval_Vector + 184
    frame #7: 0x000000010006286c python`object_vacall + 316
    frame #8: 0x0000000100062668 python`PyObject_CallMethodObjArgs + 108
    frame #9: 0x0000000100196c04 python`PyImport_ImportModuleLevelObject + 1580
    frame #10: 0x000000010015f0d0 python`_PyEval_EvalFrameDefault + 31964
    frame #11: 0x0000000100156424 python`PyEval_EvalCode + 220
    frame #12: 0x00000001001bc3f8 python`run_mod + 144
    frame #13: 0x00000001001bbe58 python`_PyRun_SimpleFileObject + 1260
    frame #14: 0x00000001001baf18 python`_PyRun_AnyFileObject + 240
    frame #15: 0x00000001001e192c python`Py_RunMain + 3100
    frame #16: 0x00000001001e2784 python`pymain_main + 1252
    frame #17: 0x0000000100003684 python`main + 56
    frame #18: 0x0000000182b6f154 dyld`start + 2476
iBims1JFK commented 3 days ago

The different Python version where used in different environments. But always compiled against the correct version of the environment. I will definitely check with the standard version although I need it to work in the conda environment.

christian-rauch commented 3 days ago

I recompiled it and judging from apriltag.cpython-311-darwin.so PyInit_apriltag at apriltag_pywrap.c:375:9 [opt] the symbols work (better?) now. And I think expectedly it did not print anything.

Did you add a print statement?

At least now you get the line numbers. apriltag.cpython-311-darwin.so`PyInit_apriltag at apriltag_pywrap.c:375:9 tells us that this is probably related to return NULL;. That is why I asked to print something before the return NULL;. I suspect hat, once you add the print statements, you will see that it goes into this branch because PyType_Ready(&apriltagType) < 0.

iBims1JFK commented 3 days ago

Interestingly, it does work like a charm when I am building outside of the conda environment but still with the brew python. I added this line as you told me

    if (PyType_Ready(&apriltagType) < 0)
    {
        printf("PyType_Ready error!\n");
        fflush(stdout);
        return NULL;
    }

But I think it crashes at if (PyType_Ready(&apriltagType) < 0) which is line 375 and therefore cannot print anything.

christian-rauch commented 3 days ago

Interestingly, it does work like a charm when I am building outside of the conda environment but still with the brew python.

That sounds definitely like a mixup of environments.

But I think it crashes at if (PyType_Ready(&apriltagType) < 0) which is line 375 and therefore cannot print anything.

Which version of the code are you on? On the current master, line 375 is return NULL;: https://github.com/AprilRobotics/apriltag/blob/786ad11fa812524f33ad8375a5f157b7e57b730d/apriltag_pywrap.c#L374-L375

If this line appear in the backtrace, a print before this, e.g.:

    if (PyType_Ready(&apriltagType) < 0) {                  // 374
        printf("PyType_Ready error!\n"); fflush(stdout);    // 375
        return NULL;                                        // 376
    }

should have been shown in the terminal.

iBims1JFK commented 3 days ago

But shouldn't it matter what environments you have, as long as you choose the right ones when building the library?

I am using the current master despite adding the lines that you asked me to. It is possible that some auto-linting misaligned some of the stuff. Now I added a print statement before the if statement. So the line count is off by one:

    printf("Before PyType_Ready\n");     // 375
    if (PyType_Ready(&apriltagType) < 0) // 376
    {                                    // 377
        printf("PyType_Ready error!\n"); // 378
        fflush(stdout);                  // 379
        return NULL;                     // 380
    } // 381
lldb /opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python
(lldb) target create "/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python"
Current executable set to '/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python' (arm64).
(lldb) run bin_test/test.py
Process 43823 launched: '/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python' (arm64)
Before PyType_Ready
Process 43823 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
    frame #0: 0x000000010137f290 libpython3.11.dylib`type_ready + 92
libpython3.11.dylib`type_ready:
->  0x10137f290 <+92>:  ldr    x8, [x8, #0x10]
    0x10137f294 <+96>:  ldr    w9, [x8, #0x1428]
    0x10137f298 <+100>: cbz    w9, 0x10137f338 ; <+260>
    0x10137f29c <+104>: sub    w9, w9, #0x1
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  * frame #0: 0x000000010137f290 libpython3.11.dylib`type_ready + 92
    frame #1: 0x000000010138fde8 libpython3.11.dylib`PyType_Ready + 52
    frame #2: 0x00000001008c1780 apriltag.cpython-311-darwin.so`PyInit_apriltag at apriltag_pywrap.c:376:9 [opt]
    frame #3: 0x000000010019bbc0 python`_imp_create_dynamic + 1188
    frame #4: 0x00000001000b8e20 python`cfunction_vectorcall_FASTCALL + 256
    frame #5: 0x0000000100164b6c python`_PyEval_EvalFrameDefault + 55160
    frame #6: 0x0000000100166e18 python`_PyEval_Vector + 184
    frame #7: 0x000000010006286c python`object_vacall + 316
    frame #8: 0x0000000100062668 python`PyObject_CallMethodObjArgs + 108
    frame #9: 0x0000000100196c04 python`PyImport_ImportModuleLevelObject + 1580
    frame #10: 0x000000010015f0d0 python`_PyEval_EvalFrameDefault + 31964
    frame #11: 0x0000000100156424 python`PyEval_EvalCode + 220
    frame #12: 0x00000001001bc3f8 python`run_mod + 144
    frame #13: 0x00000001001bbe58 python`_PyRun_SimpleFileObject + 1260
    frame #14: 0x00000001001baf18 python`_PyRun_AnyFileObject + 240
    frame #15: 0x00000001001e192c python`Py_RunMain + 3100
    frame #16: 0x00000001001e2784 python`pymain_main + 1252
    frame #17: 0x0000000100003684 python`main + 56
    frame #18: 0x0000000182b6f154 dyld`start + 2476
christian-rauch commented 3 days ago

But shouldn't it matter what environments you have, as long as you choose the right ones when building the library?

No. The build and runtime environments have to be the same. Linking against a different library than you use later can cause different kinds of ABI incompatibilities.

I am using the current master despite adding the lines that you asked me to. It is possible that some auto-linting misaligned some of the stuff.

Well, ideally your backtraces match with the source code of the repo. Otherwise, it's hard to use it to debug what is going on.

If the crash is indeed inside PyType_Ready, then there is nothing the apriltag bindings can do to fix this.

If you can reduce the crash in the CI, I can have a look at it. But other than this, I recommend that you fix your Python environment.

iBims1JFK commented 3 days ago

To clarify what I meant was using the same build and runtime environments. Meaning that when I specify the ros_env during building like this

cmake -B build -DCMAKE_BUILD_TYPE=Debug \
      -DBUILD_SHARED_LIBS=ON \
      -DBUILD_PYTHON_WRAPPER=ON \
      -DPython3_EXECUTABLE=/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/bin/python \
      -DPython3_INCLUDE_DIR=/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/include/python3.11 \
      -DPython3_LIBRARY=/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/lib/libpython3.11.dylib \
      -DPython3_NUMPY_INCLUDE_DIR=/opt/homebrew/Caskroom/mambaforge/base/envs/ros_env/lib/python3.11/site-packages/numpy/core/include

and using this specify environment during the run then it should not matter that there are other environments installed on the machine right?

christian-rauch commented 3 days ago

I am not sure how this works with the mixed homebrew and mambaforge environment. At least on Linux, if you have set the environment, e.g. conda, correctly, then you also don't need to point to absolute paths for Python manually since CMake will find the FindPython3.cmake etc. and set these variables accordingly. But I only have professional experience on Linux / Ubuntu and macOS just might work differently here.

Since your environment is named ros_env, I am wondering, are you trying to use ROS on macOS? If so, you might want to look into conda and the RoboStack, which provides a lot of the ROS packages in a conda environment. I had very good experiences with using different Python stacks in a conda environment, including ROS.