epics-base / p4p

Python bindings for the PVAccess network client and server.
BSD 3-Clause "New" or "Revised" License
27 stars 38 forks source link

Segmentation fault on Python3.7 #109

Closed AlexanderWells-diamond closed 1 year ago

AlexanderWells-diamond commented 1 year ago

It seems the newest release of p4p, 4.1.7, is causing a segmentation fault in tests for pythonSoftIoc, as seen here.

Unfortunately I am unable to recreate this issue locally - I cannot install p4p==4.1.7 due to an apparent conflict with Numpy that I cannot explain as I'm using the same version (1.21.6) as our CI uses.

The issue occurs at this line of our test:

from p4p.client.asyncio import Context

And here's the call stack that the segfault printed:

Current thread 0x00007f43788d9740 (most recent call first):
  File "/tmp/tmp.8MU0smYiyK/venv/lib/python3.7/site-packages/p4p/nt/common.py", line 18 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/tmp/tmp.8MU0smYiyK/venv/lib/python3.7/site-packages/p4p/nt/__init__.py", line 12 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/tmp/tmp.8MU0smYiyK/venv/lib/python3.7/site-packages/p4p/client/raw.py", line 19 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist
  File "/tmp/tmp.8MU0smYiyK/venv/lib/python3.7/site-packages/p4p/client/asyncio.py", line 9 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/project/tests/test_asyncio.py", line 52 in test_asyncio_ioc

Please let me know if there's any more information we can provide.

mdavidsaver commented 1 year ago

I suspect this crash is due to an ABI issue. It appears to me that the pythonSoftIOC CI build is being done in quay.io/pypa/manylinux2014_x86_64, and is building p4p 4.1.7 from source. The corresponding wheels for epicscorelibs and pvxslibs were built with manylinux1.

Is it possible to persuade cibuildwheel to call pip with --only-binary epicscorelibs,pvxslibs,p4p?

This would make future occurrences more obvious.

As for why the 3.7 linux/x86_64 wheel wasn't being used? Well, it wasn't there.

GHA was on the blink the day I did the release, with jobs being "cancelled" randomly. I though I had manually re-run these jobs to the point that all had passed, but it looks like I stopped with two left uncompleted. 3.5 and 3.7 linux/x86_64. oops...

I've run them now and verified that p4p-4.1.7-cp37-cp37m-manylinux1_x86_64.whl has been uploaded.

mdavidsaver commented 1 year ago

Some details which turned out to be irrelevant...

Current thread 0x00007f43788d9740 (most recent call first): File "/tmp/tmp.8MU0smYiyK/venv/lib/python3.7/site-packages/p4p/nt/common.py", line 18 in

https://github.com/mdavidsaver/p4p/blob/ef43365bf6c8ca5ac14d0d827d4d5d6395afc2fc/src/p4p/nt/common.py#L15-L19

A test run using https://github.com/mdavidsaver/ci-core-dumper/pull/1 gives a bit more information. Which is a fair argument for that PR being good enough.

Thread 1 (Thread 0x7fe59b9f8740 (LWP 445)):
  #0  0x00007fe59b6114fb in raise () from /lib64/libpthread.so.0
  #1  <signal handler called>
  #2  0x00007fe58d7f2c04 in pvxs::TypeDef::_append(pvxs::Member&, pvxs::Member const&) () from /tmp/tmp.4ZFQcoKe98/venv/lib/python3.7/site-packages/p4p/../pvxslibs/lib/libpvxs.so.1.2
  #3  0x00007fe58dae848c in p4p::appendPrototype(pvxs::TypeDef&, _object*) () from /tmp/tmp.4ZFQcoKe98/venv/lib/python3.7/site-packages/p4p/_p4p.cpython-37m-x86_64-linux-gnu.so
  #4  0x00007fe58dace635 in __pyx_pw_3p4p_4_p4p_5_Type_1__init__(_object*, _object*, _object*) () from /tmp/tmp.4ZFQcoKe98/venv/lib/python3.7/site-packages/p4p/_p4p.cpython-37m-x86_64-linux-gnu.so
...
AlexanderWells-diamond commented 1 year ago

Thank you Michael, our CI now passes with the prebuilt wheel.

There unfortunately seems to be no simple way to pass commands through cibuildwheel to pip wheel itself - the obvious one is to set CIBW_ENVIRONMENT: PIP_ONLY_BINARY=epicscorelibs,pvxslibs,p4p as per the pip docs, but for reasons unknown it isn't picked up. So we'll unfortunately just have to watch out for this sort of issue in the future.

mdavidsaver commented 1 year ago

fyi. I have put up a set of release candidate builds:

One of the changes made in setuptools-dso and epicscorelibs should partially address the manylinux vs. host ABI issue seen here.

Now, for GCC builds, the value of -D_GLIBCXX_USE_CXX11_ABI=... is latched when building epicscorelibs, and propagated via. epicscorelibs.config:get_config_var('CPPFLAGS'). With this change, if a build pulls in a pypi.org wheel for epicscorelibs, then any local build of p4p and maybe pvxslibs will keep the same value of _GLIBCXX_USE_CXX11_ABI used by the manylinux builds (currently 0 for all images), instead of the local default (1 for gcc >= 5.1).

Of course the "Dual ABI" difference may not be the only source of issues, though I think it is the most common.