Open andreaimprovised opened 1 month ago
I've seen similar in our testing setup.
We use async subscribers from FastStream
but the end result seems to be the same as it uses confluent under the hood.
At first it appeared to be related to our pants+nix test setup on python 3.11 but I have reproduced it running pytest within a venv on 3.12 as well.
I also run our tests within a docker compose application, and have never seen it occur there.
The main difference between the environments is between linux_aarch64
and macos_amd64
I suppose but I have also noticed the async selector is different:
KqueueSelector
locallyEpollSelector
within the containersA coworker not on macOS has reported the issue as well and it's also appeared in CI (GH Actions) so seemingly not macOS specific.
(venv) ~/org/components/gsf git:GSF-654-convert-threaded* -> python -m pytest
===================================================== test session starts =====================================================
platform darwin -- Python 3.12.5, pytest-7.4.4, pluggy-1.5.0
rootdir: /Users/ntr/org/components/gsf
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-0.23.8, env-1.1.3, cov-4.1.0, anyio-3.7.1, docker-3.1.1
asyncio: mode=Mode.AUTO
collecting ... Variable Name: ENABLE_METRICS, Value: true
Variable Name: ENABLE_LOADTESTING, Value: None
Variable Name: SEND_NOTIFICATIONS, Value: true
Variable Name: ENABLE_SENTRY, Value: None
Variable Name: ENABLE_SENTRY_TRACING, Value: None
Variable Name: IEEE_LOGGING_CLOUDWATCH, Value: None
collected 108 items
tests/gsf_tests/test_adaptors.py .... [ 3%]
tests/gsf_tests/test_config_manager.py .. [ 5%]
tests/gsf_tests/test_control_responses.py ..Fatal Python error: Segmentation fault
Current thread 0x000000016ea73000 (most recent call first):
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 807 in run
File "/opt/homebrew/Cellar/python@3.12/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
File "/opt/homebrew/Cellar/python@3.12/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1032 in _bootstrap
Thread 0x00000001f57a8f40 (most recent call first):
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 1289 in create_module
File "<frozen importlib._bootstrap>", line 813 in module_from_spec
File "<frozen importlib._bootstrap>", line 921 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1415 in _handle_fromlist
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/multidict/_compat.py", line 12 in <module>
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 995 in exec_module
File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/multidict/__init__.py", line 9 in <module>
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 995 in exec_module
File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/aiohttp/hdrs.py", line 7 in <module>
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 995 in exec_module
File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1415 in _handle_fromlist
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/aiohttp/__init__.py", line 5 in <module>
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 995 in exec_module
File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
File "/Users/ntr/org/components/gsf/src/gsf/data_plane/entrypoints/upstream.py", line 11 in <module>
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 995 in exec_module
File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
File "/Users/ntr/org/components/gsf/src/gsf/data_plane/entrypoints/control_responses_subscriber.py", line 8 in <module>
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 995 in exec_module
File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
File "/Users/ntr/org/components/gsf/tests/gsf_tests/test_control_responses.py", line 86 in test_control_responses_subscriber
File "/opt/homebrew/Cellar/python@3.12/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/events.py", line 88 in _run
File "/opt/homebrew/Cellar/python@3.12/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 1986 in _run_once
File "/opt/homebrew/Cellar/python@3.12/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 641 in run_forever
File "/opt/homebrew/Cellar/python@3.12/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 674 in run_until_complete
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 906 in inner
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/python.py", line 1792 in runtest
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 440 in runtest
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/runner.py", line 262 in <lambda>
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/runner.py", line 222 in call_and_report
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/runner.py", line 133 in runtestprotocol
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/main.py", line 350 in pytest_runtestloop
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/main.py", line 325 in _main
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/main.py", line 271 in wrap_session
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/main.py", line 318 in pytest_cmdline_main
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 169 in main
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 192 in console_main
File "/Users/ntr/org/components/gsf/venv/lib/python3.12/site-packages/pytest/__main__.py", line 5 in <module>
File "<frozen runpy>", line 88 in _run_code
File "<frozen runpy>", line 198 in _run_module_as_main
Extension modules: lxml._elementpath, lxml.etree, _cffi_backend, confluent_kafka.cimpl, charset_normalizer.md (total: 5)
[1] 14261 segmentation fault python -m pytest
the test is basically this:
message_received_event = Event()
async def subscriber(event: CloudEvent):
assert event.data
if mrid in event.data.get("request_body"):
message_received_event.set()
broker.subscriber(CONTROL_RESPONSES_TOPIC["name"], group_id="test_event_router")(subscriber)
# Start listening
await broker.start()
payload = get_control_response_payload(status=1, subject=mrid)
res = await async_der_client.post(url=url, content=payload)
assert res.status_code == 201
# Wait for the event to be set with a timeout to avoid hanging indefinitely
try:
await asyncio.wait_for(message_received_event.wait(), timeout=5.0)
except asyncio.TimeoutError:
log.error("Timed out waiting for message received event")
assert message_received_event.is_set()
await broker.close()
Reproduced it with lldb
, looks pretty cut and dry:
tests/gsf_tests/test_config_manager.py .. [ 5%]
Process 23727 stopped
* thread #9, stop reason = EXC_BAD_ACCESS (code=1, address=0xae8)
frame #0: 0x0000000105d9b5b8 librdkafka.dylib`rd_kafka_q_pop_serve + 888
librdkafka.dylib`rd_kafka_q_pop_serve:
-> 0x105d9b5b8 <+888>: ldr w8, [x19, #0xae8]
0x105d9b5bc <+892>: cmp w8, #0x1
0x105d9b5c0 <+896>: b.ne 0x105d9b608 ; <+968>
0x105d9b5c4 <+900>: add x0, sp, #0x10
Target 0: (Python) stopped.
(lldb) bt
* thread #9, stop reason = EXC_BAD_ACCESS (code=1, address=0xae8)
* frame #0: 0x0000000105d9b5b8 librdkafka.dylib`rd_kafka_q_pop_serve + 888
frame #1: 0x0000000105d70488 librdkafka.dylib`rd_kafka_consume0 + 236
frame #2: 0x0000000102959cb8 cimpl.cpython-312-darwin.so`Consumer_poll + 132
frame #3: 0x0000000100ac26d0 Python`cfunction_call + 72
frame #4: 0x0000000100a70c94 Python`_PyObject_MakeTpCall + 128
frame #5: 0x0000000100b849f4 Python`context_run + 104
frame #6: 0x0000000100ac1db8 Python`cfunction_vectorcall_FASTCALL_KEYWORDS + 92
frame #7: 0x0000000100b66bf8 Python`_PyEval_EvalFrameDefault + 50272
frame #8: 0x0000000100a73d5c Python`method_vectorcall + 372
frame #9: 0x0000000100c36720 Python`thread_run + 144
frame #10: 0x0000000100bcc850 Python`pythread_wrapper + 48
frame #11: 0x000000018d959f94 libsystem_pthread.dylib`_pthread_start + 136
Description
I have a pytest suite that will:
This test suite is notoriously prone to segmentation faults and the like that crash the entire interpreter and are very disruptive.
I have heard the confluent_kafka is thread safe, but I have not experienced that to be the case. And I'm open to the possibility that this is user error on my part. If so, please, show me the way.
The errors tend to happen after a first test case has run and during the second test case where the Consumer is attempting to monitor for assignments.
There are many different presentations:
Additionally, if the timeout argument to poll() is set, the consumer never appears to get an assignment at all.
How to reproduce
Here's a pretty elaborate script that reliably reproduces the total range of errors that I see with a lot of different knobs to tweak.
Assignments never happening when poll(1.0)
https://gist.github.com/andreaimprovised/6221cba7c0be98ee3189dd517998bda3
INTERNAL ERROR with 3 topics
https://gist.github.com/andreaimprovised/d80eedeea6ef7beb44fff228df1942da
segmentation fault with 12 topics
https://gist.github.com/andreaimprovised/5bc6acdc05fecb35d7cb7f20295c31f7
Segmentation fault with just 1 topic and 1 message per test case
https://gist.github.com/andreaimprovised/1fe7b9f8be0d34d8a6dc40827c802934
Additional requirements:
I'm currently using python 3.10.
Checklist
Please provide the following information:
confluent_kafka.version()
andconfluent_kafka.libversion()
):In [2]: confluent_kafka.version() Out[2]: ('2.5.0', 33882112)
confluentinc/cp-kafka:7.6.0
{...}
It's in the code.
I've seen this on darwin arm64 and linux x86_64.
'debug': '..'
as necessary)Here is an example
Hmmm, I'll try to figure out how to get these.
It's not critical, but it depends on how critical you think automated test suites are.