codypiersall / pynng

Python bindings for Nanomsg Next Generation.
https://pynng.readthedocs.io
MIT License
260 stars 58 forks source link

Issue with forked process, spawn works fine #113

Open lhjnilsson opened 1 year ago

lhjnilsson commented 1 year ago

Hey,

I have issue where pynng crashes with a heavy core dump if i initialize a socket after a process is started. This works under Mac but fails on Ubuntu Linux for me (github action & docker container on mac m1)

My personal conclusion is Fork vs Spawn(?). Where Mac uses Spawn by default and Linux Fork when starting a new process in python. https://docs.python.org/3/library/multiprocessing.html (Contexts and start methods)

My scenario is "Workers" that are running as processes having 3 sockets. One Respondent0 to respond to configuration changes, One Pub0 to respond to state(eg, ready) and one Rep0 where it will fetch workload from a remote target, for my scenario is OHLC(Open High Low Close) stock data. "WorkerPool" Holds the Survey0, sending configurations and listens to Sub0 for state changes.

Code: https://github.com/quantfamily/python/blob/fddc8c4b64c0696c02467ece0c14c1a7679bbae4/src/client/foreverbull/worker/worker.py Test: https://github.com/quantfamily/python/blob/fddc8c4b64c0696c02467ece0c14c1a7679bbae4/src/client/tests/worker/test_worker.py

Outcome: ../../tests/test_simply.py::test_new_pool panic: nng is not fork-reentrant safe This message is indicative of a BUG. Report this at https://github.com/nanomsg/nng/issues panic: nng is not fork-reentrant safe This message is indicative of a BUG. Report this at https://github.com/nanomsg/nng/issues /usr/local/lib/python3.10/site-packages/pynng/_nng.abi3.so(nni_panic+0xec) [0x400435040c] /usr/local/lib/python3.10/site-packages/pynng/_nng.abi3.so(nni_panic+0xec) [0x400435040c]/usr/local/lib/python3.10/site-packages/pynng/_nng.abi3.so(nni_plat_init+0x12e) [0x400435b4be]

/usr/local/lib/python3.10/site-packages/pynng/_nng.abi3.so(nni_proto_open+0x1a) [0x4004350d9a] /usr/local/lib/python3.10/site-packages/pynng/_nng.abi3.so(+0x430ff) [0x400430e0ff] /usr/local/lib/libpython3.10.so.1.0(+0x15507a) [0x400199207a] /usr/local/lib/python3.10/site-packages/pynng/_nng.abi3.so(nni_plat_init+0x12e) [0x400435b4be]/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b]

/usr/local/lib/python3.10/site-packages/pynng/_nng.abi3.so(nni_proto_open+0x1a) [0x4004350d9a]/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]

/usr/local/lib/python3.10/site-packages/pynng/_nng.abi3.so(+0x430ff) [0x400430e0ff]/usr/local/lib/libpython3.10.so.1.0(_PyObject_FastCallDictTstate+0x16d) [0x400198abcd]

/usr/local/lib/libpython3.10.so.1.0(+0x15507a) [0x400199207a]/usr/local/lib/libpython3.10.so.1.0(+0x15ef35) [0x400199bf35]

/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b]/usr/local/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x273) [0x400198ba23]

/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyObject_FastCallDictTstate+0x16d) [0x400198abcd] /usr/local/lib/libpython3.10.so.1.0(+0x15ef35) [0x400199bf35] /usr/local/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x273) [0x400198ba23] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x582d) [0x4001986b9d]/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x582d) [0x4001986b9d] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]

/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f]/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]

/usr/local/lib/libpython3.10.so.1.0(+0x161fe2) [0x400199efe2]/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x1300) [0x4001982670] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f] /usr/local/lib/libpython3.10.so.1.0(_PyObject_FastCallDictTstate+0xc8) [0x400198ab28] /usr/local/lib/libpython3.10.so.1.0(+0x15eb42) [0x400199bb42] /usr/local/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x273) [0x400198ba23] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x50db) [0x400198644b] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b]

/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]/usr/local/lib/libpython3.10.so.1.0(+0x161fe2) [0x400199efe2]

/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f]/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x1300) [0x4001982670]

/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]

/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f]/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f]

/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]/usr/local/lib/libpython3.10.so.1.0(_PyObject_FastCallDictTstate+0xc8) [0x400198ab28]

/usr/local/lib/libpython3.10.so.1.0(PyObject_Call+0xb1) [0x400199fc71]/usr/local/lib/libpython3.10.so.1.0(+0x15ef35) [0x400199bf35]

/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x29f6) [0x4001983d66]/usr/local/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x273) [0x400198ba23]

/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x50db) [0x400198644b]

/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x29f6) [0x4001983d66]/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]

/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]

/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b] /usr/local/lib/libpython3.10.so.1.0(PyObject_Call+0xb1) [0x400199fc71]

/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x29f6) [0x4001983d66]/usr/local/lib/libpython3.10.so.1.0(+0x161fe2) [0x400199efe2]

/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8]/usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b]

/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x29f6) [0x4001983d66]/usr/local/lib/libpython3.10.so.1.0(_PyObject_FastCallDictTstate+0x16d) [0x400198abcd]

/usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b] /usr/local/lib/libpython3.10.so.1.0(+0x161fe2) [0x400199efe2] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x4bab) [0x4001985f1b] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyObject_FastCallDictTstate+0x16d) [0x400198abcd] /usr/local/lib/libpython3.10.so.1.0(_PyObject_Call_Prepend+0x4c) [0x400199c83c] /usr/local/lib/libpython3.10.so.1.0(+0x2372ec) [0x4001a742ec] /usr/local/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x363) [0x400198bb13] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x582d) [0x4001986b9d] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x29f6) [0x4001983d66] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyObject_Call_Prepend+0x4c) [0x400199c83c] /usr/local/lib/libpython3.10.so.1.0(+0x2372ec) [0x4001a742ec] /usr/local/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x363) [0x400198bb13] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x582d) [0x4001986b9d] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x6df) [0x4001981a4f] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] /usr/local/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x29f6) [0x4001983d66] /usr/local/lib/libpython3.10.so.1.0(_PyFunction_Vectorcall+0x78) [0x4001992aa8] Fatal Python error: Aborted

Current thread 0x0000004001f3a4c0 (most recent call first): Fatal Python error: File "/usr/local/lib/python3.10/site-packages/pynng/nng.py", line 315 in init File "/build/client/foreverbull/worker/worker.py", line 50 in run File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap File "/usr/local/lib/python3.10/multiprocessing/popen_fork.py", line 71 in _launch File "/usr/local/lib/python3.10/multiprocessing/popen_fork.py", line 19 in init File "/usr/local/lib/python3.10/multiprocessing/context.py", line 281 in _Popen File "/usr/local/lib/python3.10/multiprocessing/context.py", line 224 in _Popen File "/usr/local/lib/python3.10/multiprocessing/process.py", line 121 in start File "/build/client/foreverbull/worker/worker.py", line 145 in setup File "/tests/test_simply.py", line 85 in test_new_pool File "/usr/local/lib/python3.10/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/python.py", line 1641 in runtest File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 255 in File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 311 in from_call File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 254 in call_runtest_hook File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 215 in call_and_report File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 126 in runtestprotocol File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/main.py", line 348 in pytest_runtestloop File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/main.py", line 323 in _main File "/usr/local/lib/python3.10/site-packages/_pytest/main.py", line 269 in wrap_session File "/usr/local/lib/python3.10/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/config/init.py", line 162 in main File "/usr/local/lib/python3.10/site-packages/_pytest/config/init.py", line 185 in console_main File "/usr/local/lib/python3.10/site-packages/pytest/main.py", line 5 in File "/usr/local/lib/python3.10/runpy.py", line 86 in _run_code File "/usr/local/lib/python3.10/runpy.py", line 196 in _run_module_as_main Aborted

Current thread 0x0000004001f3a4c0 (most recent call first): File "/usr/local/lib/python3.10/site-packages/pynng/nng.py", line 315 in init File "/build/client/foreverbull/worker/worker.py", line 50 in run File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap File "/usr/local/lib/python3.10/multiprocessing/popen_fork.py", line 71 in _launch File "/usr/local/lib/python3.10/multiprocessing/popen_fork.py", line 19 in init File "/usr/local/lib/python3.10/multiprocessing/context.py", line 281 in _Popen File "/usr/local/lib/python3.10/multiprocessing/context.py", line 224 in _Popen File "/usr/local/lib/python3.10/multiprocessing/process.py", line 121 in start File "/build/client/foreverbull/worker/worker.py", line 145 in setup File "/tests/test_simply.py", line 85 in test_new_pool File "/usr/local/lib/python3.10/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/python.py", line 1641 in runtest File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 255 in File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 311 in from_call File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 254 in call_runtest_hook File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 215 in call_and_report File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 126 in runtestprotocol File "/usr/local/lib/python3.10/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/main.py", line 348 in pytest_runtestloop File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/main.py", line 323 in _main File "/usr/local/lib/python3.10/site-packages/_pytest/main.py", line 269 in wrap_session File "/usr/local/lib/python3.10/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in call File "/usr/local/lib/python3.10/site-packages/_pytest/config/init.py", line 162 in main File "/usr/local/lib/python3.10/site-packages/_pytest/config/init.py", line 185 in console_main File "/usr/local/lib/python3.10/site-packages/pytest/main.py", line 5 in File "/usr/local/lib/python3.10/runpy.py", line 86 in _run_code File "/usr/local/lib/python3.10/runpy.py", line 196 in _run_module_as_main

gdamore commented 6 months ago

So, not too sure what Python or pynng is doing under the hood -- maybe it is trying to be clever -- but nng is not fork-reentrant safe (because it's multithreaded, and almost impossible to write fork-reentrant safe code once you start using mutexes. The POSIX group really screwed up here.)

If you use posix_spawn(), or if you do your forking before you initialize any NNG sockets, it should be fine. Fork is also fine, if you do so using whatever flags are needed to ensure that threads are not duplicated, and follow that up with an exec().

You simply cannot allow the instance of NNG started in a parent to be used in a child. It won't work. The panic here is a safety to let users know that they are trying to do something that won't work reliably, if ever, and I don't want to debug someone's messed up ("forked up") application if they try to do this.