Closed byllyfish closed 1 year ago
See grpc/grpc#31441
Here's a smaller reproducer. I think the problem is that GRPC has close code that calls task.get_stack(limit=1). This seg-faults with certain tasks, perhaps because they are created with cython?
import asyncio
import finsy as fy
from finsy.test.gnmi_server import GNMIServer
def get_stack():
tasks = asyncio.all_tasks()
for task in tasks:
print(task)
task.get_stack(limit=1) # seg faults: _task_get_stack
async def main():
async with GNMIServer("127.0.0.1:19559").run():
async with fy.GNMIClient("127.0.0.1:19559") as client:
sub = client.subscribe()
sub.once(fy.GNMIPath("/"))
await sub._subscribe()
get_stack()
asyncio.run(main())
The output on my machine looks like this:
$ python -X dev demo2.py
<Task pending name='Task-1' coro=<main() running at /Users/bfish/code/finsy/demo2.py:21> cb=[_run_until_complete_cb() at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py:180] created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/runners.py:100>
<Task pending name='Task-2' coro=<AioServer._server_main_loop()> wait_for=<Future pending cb=[Task.task_wakeup()] created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py:427> cb=[AioServer._serving_task_crash_handler()] created at /Users/bfish/code/finsy/.venv311/lib/python3.11/site-packages/grpc/aio/_server.py:112>
<Task pending name='Task-6' coro=<_handle_exceptions() running at src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi:None> wait_for=<Future finished result=None created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py:427> cb=[_add_callback_handler.<locals>.handle_callbacks()] created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/events.py:80>
Fatal Python error: Segmentation fault
Thread 0x0000700002207000 (most recent call first):
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/threading.py", line 975 in run
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/threading.py", line 995 in _bootstrap
Current thread 0x000000010f26c600 (most recent call first):
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_tasks.py", line 53 in _task_get_stack
File "/Users/bfish/code/finsy/demo2.py", line 11 in get_stack
File "/Users/bfish/code/finsy/demo2.py", line 21 in main
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/events.py", line 80 in _run
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py", line 1904 in _run_once
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py", line 607 in run_forever
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py", line 640 in run_until_complete
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/runners.py", line 118 in run
File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/runners.py", line 190 in run
File "/Users/bfish/code/finsy/demo2.py", line 24 in <module>
Extension modules: grpc._cython.cygrpc, google._upb._message (total: 2)
[1] 20930 segmentation fault python -X dev demo2.py
Note that the offending task's coroutine is cython...
<Task pending name='Task-6' coro=<_handle_exceptions() running at src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi:None> wait_for=<Future finished result=None created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py:427> cb=[_add_callback_handler.<locals>.handle_callbacks()] created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/events.py:80>
Fatal Python error: Segmentation fault
There is a Python 3.11 bug that may be related: python/cpython#99110.
The seg. fault happens on this line of code trying to access f_back
.
Still an invalid memory access on Python 3.11.1. Here's the same program with python 3.11.1 running under valgrind on ubuntu:
$ valgrind --track-origins=yes python -X dev demo.py
==19645== Memcheck, a memory error detector
==19645== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19645== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==19645== Command: python -X dev demo.py
==19645==
Executing <Task pending name='Task-1' coro=<main() running at /home/bfish/code/finsy/demo.py:15> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> cb=[_run_until_complete_cb() at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:180] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/runners.py:100> took 0.923 seconds
Executing <Task pending name='Task-1' coro=<main() running at /home/bfish/code/finsy/demo.py:19> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> cb=[_run_until_complete_cb() at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:180] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/runners.py:100> took 0.252 seconds
Executing <Task pending name='Task-3' coro=<StreamStreamCall._prepare_rpc() running at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:643> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:632> took 0.306 seconds
<Task pending name='Task-2' coro=<AioServer._server_main_loop()> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> cb=[AioServer._serving_task_crash_handler()] created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_server.py:112>
<Task pending name='Task-4' coro=<_AioCall._handle_status_once_received()> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:643>
<Task pending name='Task-6' coro=<_handle_exceptions() running at src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi:None> wait_for=<Future finished result=None created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> cb=[_add_callback_handler.<locals>.handle_callbacks()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/events.py:80>
==19645== Invalid read of size 1
==19645== at 0x2859F0: _PyFrame_IsIncomplete (pycore_frame.h:147)
==19645== by 0x2859F0: PyFrame_GetBack (frameobject.c:1326)
==19645== by 0x2859F0: frame_getback (frameobject.c:103)
==19645== by 0x2BBFAE: _PyObject_GenericGetAttrWithDict (object.c:1278)
==19645== by 0x2BB46A: PyObject_GetAttr (object.c:916)
==19645== by 0x1F9B17: _PyEval_EvalFrameDefault (ceval.c:3464)
==19645== by 0x3699E0: _PyEval_EvalFrame (pycore_ceval.h:73)
==19645== by 0x3699E0: _PyEval_Vector (ceval.c:6435)
==19645== by 0x25F573: _PyObject_VectorcallTstate (pycore_call.h:92)
==19645== by 0x25F573: object_vacall (call.c:819)
==19645== by 0x261F3B: PyObject_CallFunctionObjArgs (call.c:925)
==19645== by 0x6624E95: _asyncio_Task_get_stack_impl (_asynciomodule.c:2304)
==19645== by 0x6624E95: _asyncio_Task_get_stack (_asynciomodule.c.h:518)
==19645== by 0x26D83F: method_vectorcall_FASTCALL_KEYWORDS (descrobject.c:426)
==19645== by 0x25FD9E: _PyObject_VectorcallTstate (pycore_call.h:92)
==19645== by 0x25FD9E: PyObject_Vectorcall (call.c:299)
==19645== by 0x1FBA61: _PyEval_EvalFrameDefault (ceval.c:4772)
==19645== by 0x279E04: _PyEval_EvalFrame (pycore_ceval.h:73)
==19645== by 0x279E04: gen_send_ex2 (genobject.c:219)
==19645== by 0x279E04: PyGen_am_send (genobject.c:280)
==19645== Address 0xcdcdcdcdcdcdce12 is not stack'd, malloc'd or (recently) free'd
==19645==
Fatal Python error: Segmentation fault
Thread 0x000000000aabd640 (most recent call first):
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/threading.py", line 975 in run
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/threading.py", line 995 in _bootstrap
Current thread 0x0000000004b72b80 (most recent call first):
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_tasks.py", line 53 in _task_get_stack
File "/home/bfish/code/finsy/demo.py", line 11 in get_stack
File "/home/bfish/code/finsy/demo.py", line 21 in main
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/events.py", line 80 in _run
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py", line 1911 in _run_once
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py", line 607 in run_forever
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py", line 640 in run_until_complete
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/runners.py", line 118 in run
File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/runners.py", line 190 in run
File "/home/bfish/code/finsy/demo.py", line 24 in <module>
Extension modules: grpc._cython.cygrpc, google._upb._message (total: 2)
==19645==
==19645== Process terminating with default action of signal 11 (SIGSEGV)
==19645== at 0x49E0A7C: __pthread_kill_implementation (pthread_kill.c:44)
==19645== by 0x49E0A7C: __pthread_kill_internal (pthread_kill.c:78)
==19645== by 0x49E0A7C: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==19645== by 0x498C475: raise (raise.c:26)
==19645== by 0x498C51F: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==19645== by 0x2859EF: PyFrame_GetBack (frameobject.c:1326)
==19645== by 0x2859EF: frame_getback (frameobject.c:103)
==19645==
==19645== HEAP SUMMARY:
==19645== in use at exit: 6,454,922 bytes in 7,966 blocks
==19645== total heap usage: 19,529 allocs, 11,563 frees, 30,342,880 bytes allocated
==19645==
==19645== LEAK SUMMARY:
==19645== definitely lost: 0 bytes in 0 blocks
==19645== indirectly lost: 0 bytes in 0 blocks
==19645== possibly lost: 5,983,687 bytes in 4,258 blocks
==19645== still reachable: 471,235 bytes in 3,708 blocks
==19645== of which reachable via heuristic:
==19645== stdstring : 2,575 bytes in 50 blocks
==19645== newarray : 360 bytes in 1 blocks
==19645== suppressed: 0 bytes in 0 blocks
==19645== Rerun with --leak-check=full to see details of leaked memory
==19645==
==19645== For lists of detected and suppressed errors, rerun with: -s
==19645== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
There's an assert fail in the debug build of python 3.11.1, when printing information about the cython task.
Edit: The ResourceWarning is due to upb
violating the defensive assertion added in https://github.com/python/cpython/issues/95324
$ valgrind python -X dev demo.py
==34680== Memcheck, a memory error detector
==34680== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==34680== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==34680== Command: python -X dev demo.py
==34680==
gc:0: ResourceWarning: Object of type google._upb._message.MessageMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.MessageMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.MessageMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.MessageMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
Executing <Task pending name='Task-1' coro=<main() running at /home/bfish/code/finsy/demo.py:15> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:427> cb=[_run_until_complete_cb() at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:180] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/runners.py:100> took 0.798 seconds
Executing <Task pending name='Task-1' coro=<main() running at /home/bfish/code/finsy/demo.py:19> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:427> cb=[_run_until_complete_cb() at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:180] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/runners.py:100> took 0.319 seconds
Executing <Task pending name='Task-3' coro=<StreamStreamCall._prepare_rpc() running at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:643> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:427> created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:632> took 0.289 seconds
<Task pending name='Task-3' coro=<StreamStreamCall._prepare_rpc() running at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:643> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:427> created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:632>
python: Objects/codeobject.c:771: PyCode_Addr2Line: Assertion `addrq >= 0 && addrq < _PyCode_NBYTES(co)' failed.
Fatal Python error: Aborted
Thread 0x000000000abf6640 (most recent call first):
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/threading.py", line 975 in run
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/threading.py", line 995 in _bootstrap
Current thread 0x0000000004b72b80 (most recent call first):
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/coroutines.py", line 104 in _format_coroutine
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_tasks.py", line 18 in _task_repr_info
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_tasks.py", line 28 in _task_repr
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/reprlib.py", line 21 in wrapper
File "/home/bfish/code/finsy/demo.py", line 10 in get_stack
File "/home/bfish/code/finsy/demo.py", line 21 in main
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/events.py", line 80 in _run
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py", line 1911 in _run_once
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py", line 607 in run_forever
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py", line 640 in run_until_complete
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/runners.py", line 118 in run
File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/runners.py", line 190 in run
File "/home/bfish/code/finsy/demo.py", line 24 in <module>
Extension modules: grpc._cython.cygrpc, google._upb._message (total: 2)
==34680==
==34680== Process terminating with default action of signal 6 (SIGABRT)
==34680== at 0x49E0A7C: __pthread_kill_implementation (pthread_kill.c:44)
==34680== by 0x49E0A7C: __pthread_kill_internal (pthread_kill.c:78)
==34680== by 0x49E0A7C: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==34680== by 0x498C475: raise (raise.c:26)
==34680== by 0x4E0986: faulthandler_fatal_error (faulthandler.c:385)
==34680== by 0x498C51F: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==34680== by 0x49E0A7B: __pthread_kill_implementation (pthread_kill.c:43)
==34680== by 0x49E0A7B: __pthread_kill_internal (pthread_kill.c:78)
==34680== by 0x49E0A7B: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==34680==
==34680== HEAP SUMMARY:
==34680== in use at exit: 6,558,722 bytes in 8,085 blocks
==34680== total heap usage: 19,966 allocs, 11,881 frees, 31,248,068 bytes allocated
==34680==
==34680== LEAK SUMMARY:
==34680== definitely lost: 0 bytes in 0 blocks
==34680== indirectly lost: 0 bytes in 0 blocks
==34680== possibly lost: 6,088,930 bytes in 4,389 blocks
==34680== still reachable: 469,792 bytes in 3,696 blocks
==34680== of which reachable via heuristic:
==34680== stdstring : 2,575 bytes in 50 blocks
==34680== newarray : 360 bytes in 1 blocks
==34680== suppressed: 0 bytes in 0 blocks
==34680== Rerun with --leak-check=full to see details of leaked memory
==34680==
==34680== For lists of detected and suppressed errors, rerun with: -s
==34680== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted (core dumped)
Fixed in Python 3.11.2. (https://github.com/python/cpython/issues/99110.)
Using grpcio 1.50.0 and Python 3.11.0, the test
test_switch1
will fail with a segmentation fault.If I disable the test P4RuntimeServer with an environment variable, the test succeeds.
It seems like there is a memory misbehavior bug in the grpc async server implementation, but client code is tripping over it in _task_get_stack. If I run client code without running any of the server test code, there is no segmentation fault.
For now, the CI tests for Python 3.11 are running with FINSY_TEST_P4RUNTIME_SERVER=skip and FINSY_TEST_GNMI_SERVER=skip.