bloomberg / pystack

🔍 🐍 Like pstack but for Python!
https://bloomberg.github.io/pystack
Apache License 2.0
1.01k stars 45 forks source link

CPython assertion failures cause odd behavior #199

Closed ZeroIntensity closed 1 month ago

ZeroIntensity commented 1 month ago

Is there an existing issue for this?

Current Behavior

Running with Python 3.12.5 (unreleased) compiled with --with-pydebug and --with-assertions seems to cause some odd behavior with PyStack, when the core file fails at an assertion.

Earlier today, I noticed PyStack segfaulted on a core file that did this (although, I'm struggling to reproduce this again), and I also observed it to have some highlighting issues (but not segfault) on a different case (but still failing at an assertion).

I've emailed a core file containing the incorrect highlight to @godlygeek, but I'm still working on coming up with a reproducer for the segfault.

Expected Behavior

It shouldn't do that :stuck_out_tongue:

Steps To Reproduce

  1. Clone the 3.12 branch of CPython
  2. ./configure --with-pydebug --with-assertions
  3. Add some dummy failed assertion somewhere. You could make a new function, add to an existing one, the choice is yours -- it just needs to fail and cause a coredump.
  4. Run with pystack core --native-all

Pystack Version

1.3.0

Python Version

3.12

Linux distribution

Arch Linux

Anything else?

No response

pablogsal commented 1 month ago

I cannot reproduce this:

$ ./python -m sysconfig | grep 'pydebug'
        CONFIG_ARGS = "'--with-pydebug' '--with-assertions'"
$ ./python --version
Python 3.12.5+
$ git --no-pager diff --no-color
diff --git a/Modules/_heapqmodule.c b/Modules/_heapqmodule.c
index 00285ae01f8..d35e674f8bb 100644
--- a/Modules/_heapqmodule.c
+++ b/Modules/_heapqmodule.c
@@ -146,6 +146,7 @@ heappop_internal(PyObject *heap, int siftup_func(PyListObject *, Py_ssize_t))

     /* raises IndexError if the heap is empty */
     n = PyList_GET_SIZE(heap);
+    assert(n == 0);
     if (n == 0) {
         PyErr_SetString(PyExc_IndexError, "index out of range");
         return NULL;

$ ./python -c 'import heapq; heapq.heappop([1,2,3])'
python: ./Modules/_heapqmodule.c:149: heappop_internal: Assertion `n == 0' failed.
[1]    1427571 IOT instruction (core dumped)  ./python -c 'import heapq; heapq.heappop([1,2,3])'

$ pystack core ~/github/python/3.12/core.python.1000.5dbb8d0016f64d8eb106608c574a9089.1372147.1723205627000000 --native
Using executable found in the core file: /home/pablogsal/github/python/3.12/python

Core file information:
state: R zombie: True niceness: 0
pid: 1372147 ppid: 1331964 sid: 1331964
uid: 1000 gid: 1000 pgrp: 1372147
executable: python arguments: ./python -c import heapq; heapq.heappop([1,2,3])

The process died due receiving signal SIGABRT
Traceback for thread 1372147 [Has the GIL] (most recent call last):
    (C) File "???", line 0, in _start (python)
    (C) File "../csu/libc-start.c", line 360, in __libc_start_main@@GLIBC_2.34 (libc.so.6)
    (C) File "../sysdeps/nptl/libc_start_call_main.h", line 58, in __libc_start_call_main (libc.so.6)
    (C) File "./Programs/python.c", line 15, in main (python)
    (C) File "Modules/main.c", line 767, in Py_BytesMain (python)
    (C) File "Modules/main.c", line 743, in pymain_main (python)
    (C) File "Modules/main.c", line 713, in Py_RunMain (python)
    (C) File "Modules/main.c", line 624, in pymain_run_python (python)
    (C) File "Modules/main.c", line 255, in pymain_run_command (python)
    (C) File "Python/pythonrun.c", line 480, in PyRun_SimpleStringFlags (python)
    (C) File "Python/pythonrun.c", line 1618, in PyRun_StringFlags (python)
    (C) File "Python/pythonrun.c", line 1743, in run_mod (python)
    (C) File "Python/pythonrun.c", line 1722, in run_eval_code_obj (python)
    (Python) File "<string>", line 1, in <module>
    (C) File "./Modules/clinic/_heapqmodule.c.h", line 68, in _heapq_heappop (_heapq.cpython-312d-x86_64-linux-gnu.so)
    (C) File "./Modules/_heapqmodule.c", line 187, in _heapq_heappop_impl (_heapq.cpython-312d-x86_64-linux-gnu.so)
    (C) File "./Modules/_heapqmodule.c", line 149, in _heapq__heappop_max_impl (_heapq.cpython-312d-x86_64-linux-gnu.so)
    (C) File "/usr/src/debug/glibc/glibc/assert/assert.c", line 94, in _nl_load_domain.cold (libc.so.6)
    (C) File "/usr/src/debug/glibc/glibc/stdlib/abort.c", line 79, in abort (libc.so.6)
    (C) File "../sysdeps/posix/raise.c", line 26, in raise (libc.so.6)
    (C) File "/usr/src/debug/glibc/glibc/nptl/pthread_kill.c", line 44, in __pthread_kill_implementation (libc.so.6)
pablogsal commented 1 month ago

This is also running on arch :)

ZeroIntensity commented 1 month ago

Odd, I'll see if I can come up with a more solid reproducer. Specifically, I was having trouble with assertion failures in _asyncio, could this somehow be specific to that module?

pablogsal commented 1 month ago

_asyncio, could this somehow be specific to that module?

Nop:

$ git --no-pager diff --no-color
diff --git a/Modules/_asynciomodule.c b/Modules/_asynciomodule.c
index 6b969edca29..e2a1d7a542f 100644
--- a/Modules/_asynciomodule.c
+++ b/Modules/_asynciomodule.c
@@ -179,6 +179,7 @@ _is_coroutine(asyncio_state *state, PyObject *coro)
        Do this check after 'future_init()'; in case we need to raise
        an error, __del__ needs a properly initialized object.
     */
+   assert(coro == NULL);
     PyObject *res = PyObject_CallOneArg(state->asyncio_iscoroutine_func, coro);
     if (res == NULL) {
         return -1;

$ ./python -m test test_asyncio
Using random seed: 246159022
0:00:00 load avg: 1.65 Run 30 tests sequentially
0:00:00 load avg: 1.65 [ 1/30] test_asyncio.test_base_events
python: ./Modules/_asynciomodule.c:182: _is_coroutine: Assertion `coro == NULL' failed.
Fatal Python error: Aborted

Current thread 0x000071d15de54740 (most recent call first):
  File "/home/pablogsal/github/python/3.12/Lib/asyncio/base_events.py", line 458 in create_task
  File "/home/pablogsal/github/python/3.12/Lib/asyncio/events.py", line 88 in _run
  File "/home/pablogsal/github/python/3.12/Lib/asyncio/base_events.py", line 1986 in _run_once
  File "/home/pablogsal/github/python/3.12/Lib/asyncio/base_events.py", line 641 in run_forever
  File "/home/pablogsal/github/python/3.12/Lib/asyncio/base_events.py", line 674 in run_until_complete
  File "/home/pablogsal/github/python/3.12/Lib/test/test_asyncio/utils.py", line 111 in run_briefly
  File "/home/pablogsal/github/python/3.12/Lib/test/test_asyncio/test_base_events.py", line 987 in test_asyncgen_finalization_by_gc
  File "/home/pablogsal/github/python/3.12/Lib/unittest/case.py", line 589 in _callTestMethod
  File "/home/pablogsal/github/python/3.12/Lib/unittest/case.py", line 634 in run
  File "/home/pablogsal/github/python/3.12/Lib/unittest/case.py", line 690 in __call__
  File "/home/pablogsal/github/python/3.12/Lib/unittest/suite.py", line 122 in run
  File "/home/pablogsal/github/python/3.12/Lib/unittest/suite.py", line 84 in __call__
  File "/home/pablogsal/github/python/3.12/Lib/unittest/suite.py", line 122 in run
  File "/home/pablogsal/github/python/3.12/Lib/unittest/suite.py", line 84 in __call__
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/testresult.py", line 148 in run
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 57 in _run_suite
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 37 in run_unittest
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 135 in test_func
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 91 in regrtest_runner
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 138 in _load_run_test
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 181 in _runtest_env_changed_exc
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 281 in _runtest
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 307 in run_single_test
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 350 in run_test
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 385 in run_tests_sequentially
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 524 in _run_tests
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 558 in run_tests
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 721 in main
  File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 729 in main
  File "/home/pablogsal/github/python/3.12/Lib/test/__main__.py", line 2 in <module>
  File "/home/pablogsal/github/python/3.12/Lib/runpy.py", line 88 in _run_code
  File "/home/pablogsal/github/python/3.12/Lib/runpy.py", line 198 in _run_module_as_main
[1]    2772445 IOT instruction (core dumped)  ./python -m test test_asyncio

$ pystack core ~/github/python/3.12/core.python.1000.5dbb8d0016f64d8eb106608c574a9089.2738263.1723210395000000 --native-all 
Using executable found in the core file: /home/pablogsal/github/python/3.12/python

Core file information:
state: R zombie: True niceness: 0
pid: 2738263 ppid: 1331964 sid: 1331964
uid: 1000 gid: 1000 pgrp: 2738263
executable: python arguments: ./python -m test test_asyncio

The process died due receiving signal SIGABRT
Traceback for thread 2738263 [Has the GIL] (most recent call last):
    (C) File "???", line 0, in _start (python)
    (C) File "../csu/libc-start.c", line 360, in __libc_start_main@@GLIBC_2.34 (libc.so.6)
    (C) File "../sysdeps/nptl/libc_start_call_main.h", line 58, in __libc_start_call_main (libc.so.6)
    (C) File "./Programs/python.c", line 15, in main (python)
    (C) File "Modules/main.c", line 767, in Py_BytesMain (python)
    (C) File "Modules/main.c", line 743, in pymain_main (python)
    (C) File "Modules/main.c", line 713, in Py_RunMain (python)
    (C) File "Modules/main.c", line 627, in pymain_run_python (python)
    (C) File "Modules/main.c", line 300, in pymain_run_module (python)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/runpy.py", line 198, in _run_module_as_main
        return _run_code(code, main_globals, None,
    (Python) File "/home/pablogsal/github/python/3.12/Lib/runpy.py", line 88, in _run_code
        exec(code, run_globals)
    (C) File "Python/clinic/bltinmodule.c.h", line 586, in builtin_exec (python)
    (C) File "Python/bltinmodule.c", line 1096, in builtin_exec_impl (python)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/__main__.py", line 2, in <module>
        main(_add_python_opts=True)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 729, in main
        Regrtest(ns, _add_python_opts=_add_python_opts).main(tests=tests)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 721, in main
        exitcode = self.run_tests(selected, tests)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 558, in run_tests
        return self._run_tests(selected, tests)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 524, in _run_tests
        tracer = self.run_tests_sequentially(runtests)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 385, in run_tests_sequentially
        result = self.run_test(test_name, runtests, tracer)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/main.py", line 350, in run_test
        result = run_single_test(test_name, runtests)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 307, in run_single_test
        _runtest(result, runtests)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 281, in _runtest
        _runtest_env_changed_exc(result, runtests,
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 181, in _runtest_env_changed_exc
        _load_run_test(result, runtests)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 138, in _load_run_test
        regrtest_runner(result, test_func, runtests)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 91, in regrtest_runner
        test_result = test_func()
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 135, in test_func
        return run_unittest(test_mod)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 37, in run_unittest
        return _run_suite(tests)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/single.py", line 57, in _run_suite
        result = runner.run(suite)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/libregrtest/testresult.py", line 148, in run
        test(self.result)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/unittest/suite.py", line 84, in __call__
        return self.run(*args, **kwds)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/unittest/suite.py", line 122, in run
        test(result)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/unittest/suite.py", line 84, in __call__
        return self.run(*args, **kwds)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/unittest/suite.py", line 122, in run
        test(result)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/unittest/case.py", line 690, in __call__
        return self.run(*args, **kwds)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/unittest/case.py", line 634, in run
        self._callTestMethod(testMethod)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/unittest/case.py", line 589, in _callTestMethod
        if method() is not None:
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/test_asyncio/test_base_events.py", line 987, in test_asyncgen_finalization_by_gc
        test_utils.run_briefly(self.loop)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/test/test_asyncio/utils.py", line 111, in run_briefly
        loop.run_until_complete(t)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/asyncio/base_events.py", line 674, in run_until_complete
        self.run_forever()
    (Python) File "/home/pablogsal/github/python/3.12/Lib/asyncio/base_events.py", line 641, in run_forever
        self._run_once()
    (Python) File "/home/pablogsal/github/python/3.12/Lib/asyncio/base_events.py", line 1986, in _run_once
        handle._run()
    (Python) File "/home/pablogsal/github/python/3.12/Lib/asyncio/events.py", line 88, in _run
        self._context.run(self._callback, *self._args)
    (C) File "Python/context.c", line 668, in context_run (python)
    (Python) File "/home/pablogsal/github/python/3.12/Lib/asyncio/base_events.py", line 458, in create_task
        task = tasks.Task(coro, loop=self, name=name, context=context)
    (C) File "./Modules/clinic/_asynciomodule.c.h", line 565, in _asyncio_Task___init__ (_asyncio.cpython-312d-x86_64-linux-gnu.so)
    (C) File "./Modules/_asynciomodule.c", line 2110, in _asyncio_Task___init___impl (_asyncio.cpython-312d-x86_64-linux-gnu.so)
    (C) File "./Modules/_asynciomodule.c", line 228, in is_coroutine (_asyncio.cpython-312d-x86_64-linux-gnu.so)
    (C) File "./Modules/_asynciomodule.c", line 182, in _is_coroutine (_asyncio.cpython-312d-x86_64-linux-gnu.so)
    (C) File "/usr/src/debug/glibc/glibc/assert/assert.c", line 94, in _nl_load_domain.cold (libc.so.6)
    (C) File "/usr/src/debug/glibc/glibc/stdlib/abort.c", line 79, in abort (libc.so.6)
    (C) File "../sysdeps/posix/raise.c", line 26, in raise (libc.so.6)
    (C) File "/usr/src/debug/glibc/glibc/nptl/pthread_kill.c", line 44, in __pthread_kill_implementation (libc.so.6)
    (C) File "???", line 0, in __restore_rt (libc.so.6)
    (C) File "./Modules/faulthandler.c", line 329, in faulthandler_fatal_error (python)
    (C) File "../sysdeps/posix/raise.c", line 26, in raise (libc.so.6)
    (C) File "/usr/src/debug/glibc/glibc/nptl/pthread_kill.c", line 44, in __pthread_kill_implementation (libc.so.6)
ZeroIntensity commented 1 month ago

Ok, I'll look for a proper reproducer a bit later today. In the meantime, @godlygeek should have forwarded you some core files that cause problems with PyStack (both of which, were related to assertion errors in _asyncio).

pablogsal commented 1 month ago

Ok, I'll look for a proper reproducer a bit later today. In the meantime, @godlygeek should have forwarded you some core files that cause problems with PyStack (both of which, were related to assertion errors in _asyncio).

Yeah I looked at them but there is something wrong with those core files because all the symbols are in weird places. I think I need the original executable as well (maybe the entire CPython folder zipped would be better)

ZeroIntensity commented 1 month ago

Here's what I exactly did to get the segfault when I was debugging yesterday, can you see if it works on your end?

import asyncio
import _asyncio
import gc

async def main():
    it = iter(asyncio.Future())
    del it
    xs = gc.get_referents(_asyncio)

asyncio.run(main())

Running pystack core on the generated core file will cause PyStack to segfault. (Apologies for the long instructions, but this reliably produces a segfault on my end.)

pablogsal commented 1 month ago

Thanks a lot @ZeroIntensity for the reproducer. I think I found the problem!