areaDetector / ADPylon

An EPICS areaDetector driver for cameras from Basler using their Pylon SDK.
https://areadetector.github.io/areaDetector/ADPylon/ADPylon.html
1 stars 3 forks source link

IOC receives a Segmentation fault either on exit or at random while collecting images. #1

Closed AbdallaDalleh closed 1 year ago

AbdallaDalleh commented 1 year ago

Hi

OS: Rocky Linux 8.6
EPICS base: 3.15.6
Support modules: master branches for all.
Pylon SDK: 7.2.1

For some reason when you type exit at the IOC shell, the IOC exits but while receiving a Segmentation Fault error. While acquiring images, suddenly the IOC crashes because of a segmentation fault when it was trying to exit for some reason.

Here is the output while running under gdb, the IOC immediately exits:

[New Thread 0x7fffd56bc700 (LWP 49885)]
iocRun: All initialization complete
epics> 
epics> [New Thread 0x7fff937fa700 (LWP 49886)]
create_monitor_set("pylon_settings.req", 1,"P=$(PREFIX),R=$(R)")
[New Thread 0x7fff935f9700 (LWP 49887)]
[New Thread 0x7fff934f8700 (LWP 49888)]
epics> set_savefile_name("pylon_settings.req", "$(PORT)-cam")
epics> 
epics> create_monitor_set("NDStdArrays_settings.req", 1,"P=$(PREFIX),R=image1:")
epics> set_savefile_name("NDStdArrays_settings.req", "$(PORT)-image")
epics> 
epics> create_monitor_set("NDStats_settings.req", 1,"P=$(PREFIX),R=Stats1:")
epics> set_savefile_name("NDStats_settings.req", "$(PORT)-stats")
epics> 
epics> [Thread 0x7fffd59bf700 (LWP 49860) exited]
[Thread 0x7fffd5cc2700 (LWP 49857) exited]
[Thread 0x7fffd5fc5700 (LWP 49854) exited]
[New Thread 0x7fffd59bf700 (LWP 49889)]
[New Thread 0x7fffd5fc5700 (LWP 49890)]
[Thread 0x7fffdd451700 (LWP 49845) exited]
2023/03/09 08:55:16.991 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.003 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

[New Thread 0x7fffdd451700 (LWP 49891)]
[New Thread 0x7fff92818700 (LWP 49892)]
2023/03/09 08:55:17.516 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.533 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.549 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.664 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.797 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.826 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.855 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.883 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.911 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:17.939 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

[Thread 0x7fff934f8700 (LWP 49888) exited]
[Thread 0x7fffd5fc5700 (LWP 49890) exited]
[Thread 0x7fffd59bf700 (LWP 49889) exited]
[Thread 0x7fff935f9700 (LWP 49887) exited]
[Thread 0x7fffd4e20700 (LWP 49868) exited]
[Thread 0x7fffd55a5700 (LWP 49863) exited]
2023/03/09 08:55:18.061 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:18.174 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:18.188 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

2023/03/09 08:55:18.203 Param[SERIAL_NUMBER] GenICamFeature::read: feature DeviceSerialNumber exception basic_string::_M_construct null not valid

[Thread 0x7fffd6fc7700 (LWP 49850) exited]

Thread 1 "ioc" received signal SIGSEGV, Segmentation fault.
0x00007ffff6199c07 in ADPylon::shutdown (this=0x70d168) at ../ADPylon.cpp:206
206     lock();
Missing separate debuginfos, use: yum debuginfo-install epics-base-3.15-6.el8.x86_64 glibc-2.28-189.5.el8_6.x86_64 keyutils-libs-1.5.10-9.el8.x86_64 krb5-libs-1.18.2-14.el8.x86_64 libXau-1.0.9-3.el8.x86_64 libXext-1.3.4-1.el8.x86_64 libcom_err-1.45.6-4.el8.x86_64 libgcc-8.5.0-10.1.el8_6.x86_64 libselinux-2.9-5.el8.x86_64 libstdc++-8.5.0-10.1.el8_6.x86_64 libtirpc-1.1.4-6.el8.x86_64 libxcb-1.13.1-1.el8.x86_64 libxml2-2.9.7-13.el8_6.1.x86_64 ncurses-libs-6.1-9.20180224.el8.x86_64 openssl-libs-1.1.1k-7.el8_6.x86_64 pcre2-10.32-3.el8_6.x86_64 readline-7.0-10.el8.x86_64 xz-libs-5.2.4-4.el8_6.x86_64 zlib-1.2.11-18.el8_5.x86_64
(gdb) 

Here is the output from the backtrace from the core dump:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /opt/pylon/lib/libpylonbase.so.7.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/f4/66e310677c4b56b5779261470d8d5696a88ca2.debug
Missing separate debuginfo for /opt/pylon/lib/libpylonutility.so.7.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/0c/ceb57d50561e57b91bc614c321a143ffab0053.debug
Missing separate debuginfo for /opt/pylon/lib/libGenApi_gcc_v3_1_Basler_pylon.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/14/60d08057af1a68914df51e4e3007d886222ee0.debug
Missing separate debuginfo for /opt/pylon/lib/libLog_gcc_v3_1_Basler_pylon.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/78/fea755804dfc876302517b11b48c19d76648ce.debug
Missing separate debuginfo for /opt/pylon/lib/libGCBase_gcc_v3_1_Basler_pylon.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ad/1c25d62e3715363a28a90884f04a9d0cfb6445.debug
Missing separate debuginfo for /opt/pylon/lib/libMathParser_gcc_v3_1_Basler_pylon.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/3d/735650a47622f41be22cd16bf46dcbc932b7a8.debug
Missing separate debuginfo for /opt/pylon/lib/libXmlParser_gcc_v3_1_Basler_pylon.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/78/d3020c988cbd9885de2600a67fe3ff414fcf81.debug
Missing separate debuginfo for /opt/pylon/lib/libNodeMapData_gcc_v3_1_Basler_pylon.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/42/81b08c71ddee6a130b0b059369331035327ec8.debug
Missing separate debuginfo for /opt/pylon/lib/liblog4cpp_gcc_v3_1_Basler_pylon.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/c4/48a0c812d3c0126ddef02a5002a48d89d81bf0.debug
Missing separate debuginfo for /opt/pylon/lib/libpylon_TL_camemu.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/bc/bfe3646fb7212c2d497662505ec56978eacde9.debug
Missing separate debuginfo for /opt/pylon/lib/libpylon_TL_gige.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/c8/ab1b2606d6b0066025cf52415d3ada8ffcdabe.debug
Missing separate debuginfo for /opt/pylon/lib/libgxapi.so.7.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/33/b2c86474cd14bd883558f5ef92959183f0296b.debug
Missing separate debuginfo for /opt/pylon/lib/libpylon_TL_gtc.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/1d/30e85599899724e8917928bd7334a194623d3b.debug
Missing separate debuginfo for /opt/pylon/lib/libpylon_TL_usb.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/61/4b94c3591d418df46ae7fcb2007867c7c46de9.debug
Missing separate debuginfo for /opt/pylon/lib/libuxapi.so.7.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/13/86cc71a0d272b5deb9db784e8013c3e72e2a8a.debug
Missing separate debuginfo for /opt/pylon/lib/pylon-libusb-1.0.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/98/82c433a9a131a3a414df730ebb83cbe9af6ec2.debug
Core was generated by `./bin/linux-x86_64/ioc iocBoot/ioc/st.cmd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f657a27fc07 in ADPylon::shutdown (this=0x22bbac8) at ../ADPylon.cpp:206
206     lock();
[Current thread is 1 (Thread 0x7f657c0ac840 (LWP 50245))]
Missing separate debuginfos, use: yum debuginfo-install epics-base-3.15-6.el8.x86_64 glibc-2.28-189.5.el8_6.x86_64 keyutils-libs-1.5.10-9.el8.x86_64 krb5-libs-1.18.2-14.el8.x86_64 libXau-1.0.9-3.el8.x86_64 libXext-1.3.4-1.el8.x86_64 libcom_err-1.45.6-4.el8.x86_64 libgcc-8.5.0-10.1.el8_6.x86_64 libselinux-2.9-5.el8.x86_64 libstdc++-8.5.0-10.1.el8_6.x86_64 libtirpc-1.1.4-6.el8.x86_64 libxcb-1.13.1-1.el8.x86_64 libxml2-2.9.7-13.el8_6.1.x86_64 ncurses-libs-6.1-9.20180224.el8.x86_64 openssl-libs-1.1.1k-7.el8_6.x86_64 pcre2-10.32-3.el8_6.x86_64 readline-7.0-10.el8.x86_64 xz-libs-5.2.4-4.el8_6.x86_64 zlib-1.2.11-18.el8_5.x86_64
(gdb) thread all bt
Invalid thread ID: all bt
(gdb) backtrace
#0  0x00007f657a27fc07 in ADPylon::shutdown (this=0x22bbac8) at ../ADPylon.cpp:206
#1  0x0000000000000062 in ?? ()
#2  0x00007f657a27fc19 in ADPylon::shutdown (this=0x22aeea0) at ../ADPylon.cpp:208
#3  0x00007f657a4c0d28 in epicsExitCallAtExitsPvt () from /opt/epics/base/lib/linux-x86_64/libCom.so.3.15
#4  0x00007f657a4c0e21 in epicsExitCallAtExits () from /opt/epics/base/lib/linux-x86_64/libCom.so.3.15
#5  0x00007f657a4c1108 in epicsExit () from /opt/epics/base/lib/linux-x86_64/libCom.so.3.15
#6  0x0000000000406c3d in main (argc=<optimized out>, argv=<optimized out>) at ../iocMain.cpp:21

I thought maybe because we are using the latest 7.2.1 so I tried different versions until version 6.2 I think and I got the same behavior. Again, this behavior happens randomly, sometimes within 2 to 3 hours sometimes more than 12 hours.

xiaoqiangwang commented 1 year ago

I had run tests under Debian 12(bookworm) and did not see such crashes.

Now I ran the tests under a RockyLinux 8 VM and see the crash on exit command. It points to the camera_.Close function call. Actually even a call of camera_.IsOpen inside shutdown function will cause a crash.

xiaoqiangwang commented 1 year ago

I push a change to remove the shutdown hook. Please give it a try.

xiaoqiangwang commented 1 year ago

I suppose the crash does not happen anymore and close the issue.

xiaoqiangwang commented 1 year ago

After upgrading to Pylon SDK 7.3.0 on Rocky Linux, IOC crashes on exit.

#0  0x00007ffff4b5faff in raise () from /lib64/libc.so.6
#1  0x00007ffff4b32ea5 in abort () from /lib64/libc.so.6
#2  0x00007ffff550109b in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] () from /lib64/libstdc++.so.6
#3  0x00007ffff550753c in __cxxabiv1::__terminate(void (*)()) ()
   from /lib64/libstdc++.so.6
#4  0x00007ffff5507597 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007fffed4033a5 in ?? ()
   from /home/l_wang/pylon-7.3.0.27189_linux-x86_64/lib/libuxapi.so.7.3
#6  0x00007ffff4b6229c in __run_exit_handlers () from /lib64/libc.so.6
#7  0x00007ffff4b623d0 in exit () from /lib64/libc.so.6
#8  0x00007ffff583e6cc in epicsExit (status=status@entry=0)
    at ../misc/epicsExit.c:187
#9  0x0000000000408c2d in main (argc=<optimized out>, argv=<optimized out>)
    at ../pylonAppMain.cpp:21

The crash point is in an exit handler of libuxapi.so.7.3.

Now on a Debian Bullseye 11.7 Linux, Pylon SDK 7.3.0 also crashes on exit, while version 7.2.1 has no such issue.

xiaoqiangwang commented 1 year ago

It seems one has to call PylonTerminate before program exits. So I add back the shutdown handler in 0421ef9. The good news is that now IOC exits cleanly on both Debian and Rocky Linux with both 7.2 and 7.3 Pylon SDK.