Closed fbi-ranger closed 5 years ago
Let me look into this, Florian, and I'll get back to you.
Thank you Fish. That is very kind of you.
But please do not forget it’s Christmas. Have a nice Christmas Eve.
Kind regards, Florian
Thank you Fish. That is very kind of you.
Not at all! I have no life outside of Hercules and I couldn't sleep anyway. :)
But please do not forget it’s Christmas. Have a nice Christmas Eve.
Thank you. Same to you too!
I am unable to reproduce this problem on my CentOS 6.10 system. I started hercules with an LCS device (and the device was successfully opened) and then immediately did a quit, and Hercules ended normally just like it always does for me. I tried it both as a regular user and as root too, and both times Hercules ended cleanly.
I did not actually try to IPL my guest however. Does the problem occur for you only when a guest operating system is IPLed and then later ended? Or does it occur just starting Hercules and then immediately quitting, without doing an IPL?
Also, have you checked your build log to ensure Hercules was built correctly? (without any errors or warnings).
Also, when you built your Hercules, did you build the External Packages beforehand? Or are you using the ones that come delivered with Hercules? If you didn't bother to build the External Packages for yourself, you might want to try doing that and then rebuilding Hercules afterwards. Perhaps that's where your problem is? What system are you running on anyway?
One other thing too: you might want to use gdb
to try and determine exactly where in Hercules it is crashing. I found a web page on stackoverflow.com that explains how to do it:
Basically, based on the very first reply of the above mentioned stackoverflow post, you should do:
$ gdb hercules
(gdb) run -f myhercconfig.cnf
<segfault happens here>
(gdb) backtrace
<offending code is shown here>
I would try it myself, but as I explained, I was unable to make Hercules crash. It works fine for me. But since it crashes for you, it would be very helpful if you could determine precisely where Hercules is crashing. Thanks!
No, I started Hercules and quit immediately. No OS was started before quit.
LINUX is openSuse 15 with latest fixes applied. Hercules 4.0.0 (the other Hyperion) runs with out any problem.
Config is now shirked to some DASDs in order to keep the log small. LCS is enabled (F00/F01).
I have started gdb:
(gdb) run -f z390/etc/hercmini.cnf
Starting program: /local/sys1/z390/herc15001/bin/hercules -f z390/etc/hercmini.cnf
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: zypper install -C "debuginfo(build-id)=4062821f420b0c3d46ea03f208cae1a710516c4e"
Missing separate debuginfo for /lib64/librt.so.1
Try: zypper install -C "debuginfo(build-id)=a1a84d304e283d52e44332ec3fbf2f6f705bd5ff"
Missing separate debuginfo for /lib64/libresolv.so.2
Try: zypper install -C "debuginfo(build-id)=70404535e145645b599c469ea4476fc4c8357b03"
Missing separate debuginfo for /lib64/libm.so.6
Try: zypper install -C "debuginfo(build-id)=1e8038d58788ff7546c54ef151a441567c5119dc"
Missing separate debuginfo for /lib64/libdl.so.2
Try: zypper install -C "debuginfo(build-id)=466795ee7b9ca76122c66d034e7f18c7593d306e"
Missing separate debuginfo for /usr/lib64/libbz2.so.1
Try: zypper install -C "debuginfo(build-id)=78a5e01ade6b3d8db9bc9bcf7d7452c057ab7ac1"
Missing separate debuginfo for /lib64/libz.so.1
Try: zypper install -C "debuginfo(build-id)=9ca7a2b246871c3eeaa954a4a1315bbbbd335cc7"
Missing separate debuginfo for /lib64/libpthread.so.0
Try: zypper install -C "debuginfo(build-id)=f82798ed148c2a88dcddbaa67c838c824e1a43e9"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /lib64/libc.so.6
Try: zypper install -C "debuginfo(build-id)=95b799e45a989e22af6e9d31ec729170e2c92dd2"
[New Thread 0x7ffff5818700 (LWP 28233)]
HHC00109E set_thread_priority( 5 ) failed: Operation not permitted
HHC00007I Previous message from function 'impl' at impl.c(837)
HHC00110W Defaulting all threads to priority 1
HHC00007I Previous message from function 'impl' at impl.c(840)
HHC00100I Thread id 00007ffff7fc9740, prio -1, name 'impl_thread' started
HHC00100I Thread id 00007ffff5818700, prio -1, name 'logger_thread' started
HHC01413I Hercules version 4.2.0.0-SDL-gc8addaaf-modified (4.2.0.0)
HHC01414I (C) Copyright 1999-2018 by Roger Bowler, Jan Jaeger, and others
HHC01417I YBI-15001-9473
HHC01415I Build date: Dec 23 2018 at 22:50:08
HHC01417I Built with: GCC 7.3.1 20180323 [gcc-7-branch revision 258812]
HHC01417I Build type: GNU/Linux x86_64 host architecture build
HHC01417I Modes: S/370 ESA/390 z/Arch
HHC01417I Max CPU Engines: 12
HHC01417I Using shared libraries
HHC01417I Using setresuid() for setting privileges
HHC01417I Using POSIX threads Threading Model
HHC01417I Using Error-Checking Mutex Locking Model
HHC01417I With Shared Devices support
HHC01417I With Dynamic loading support
HHC01417I With External GUI support
HHC01417I With IPV6 support
HHC01417I With HTTP Server support
HHC01417I With sqrtl support
HHC01417I With SIGABEND handler
HHC01417I With CCKD BZIP2 support
HHC01417I With HET BZIP2 support
HHC01417I With ZLIB support
HHC01417I With Regular Expressions support
HHC01417I Without Object REXX support
HHC01417I With Regina REXX support
HHC01417I With Automatic Operator support
HHC01417I Without National Language Support
HHC01417I With CCKD64 Support
HHC01417I Machine dependent assists: cmpxchg1 cmpxchg4 cmpxchg8 hatomics=C11
HHC01417I Running on: hercules (Linux-4.12.14-lp150.12.28-default x86_64) MP=8
HHC01417I Built with decNumber external package version 3.68.0.79-g53f2512
HHC01417I Built with SoftFloat external package version 3.5.0.82-g1c66591
HHC01417I Built with telnet external package version 1.0.0.41-ged0ddec
HHC00018W Hercules is NOT running in elevated mode
HHC00007I Previous message from function 'impl' at impl.c(895)
Missing separate debuginfo for /lib64/libnss_files.so.2
Try: zypper install -C "debuginfo(build-id)=e71acc15f2935641bffdb8f78f0faef6f4c7acff"
HHC00150I Crypto module loaded (C) Copyright 2003-2016 by Bernard van der Helm
HHC01417I Built with crypto external package version 1.0.0.26-gefe199e
HHC00151I Activated facility: Message Security Assist
HHC00151I Activated facility: Message Security Assist Extension 1, 2, 3 and 4
Missing separate debuginfo for /lib64/libcrypt.so.1
Try: zypper install -C "debuginfo(build-id)=9a002f8c48735ff1fe0cbe4aa64b9a0cb4b2f84e"
HHC17528I REXX(Regina) VERSION: REXX-Regina_3.9.1 5.00 5 Apr 2015
HHC17529I REXX(Regina) SOURCE: UNIX
HHC17525I REXX(Regina) Rexx has been started/enabled
HHC17500I REXX(Regina) Mode : Command
HHC17500I REXX(Regina) MsgLevel : Off
HHC17500I REXX(Regina) MsgPrefix : Off
HHC17500I REXX(Regina) ErrPrefix : Off
HHC17500I REXX(Regina) Resolver : On
HHC17500I REXX(Regina) SysPath ( 6) : On
HHC17500I REXX(Regina) RexxPath ( 0) :
HHC17500I REXX(Regina) Extensions ( 8) : .REXX:.rexx:.REX:.rex:.CMD:.cmd:.RX:.rx
[New Thread 0x7ffff4bc8700 (LWP 28234)]
[New Thread 0x7ffff48c4700 (LWP 28235)]
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
[New Thread 0x7ffff47c3700 (LWP 28236)]
HHC00100I Thread id 00007ffff47c3700, prio -1, name 'timer_thread' started
HHC00100I Thread id 00007ffff48c4700, prio -1, name 'Processor CP00' started
HHC00811I Processor CP00: architecture mode z/Arch
HHC02204I CPUSERIAL set to 1BA2EF
HHC02204I CPUMODEL set to 2827
HHC02204I MODEL set to hardware(H20) capacity(H20) perm() temp()
HHC02204I PLANT set to 01
HHC17003I MAIN storage is 8G (mainsize); storage is not locked
[New Thread 0x7ffff4ac7700 (LWP 28237)]
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007ffff4ac7700, prio -1, name 'Processor CP01' started
HHC00811I Processor CP01: architecture mode z/Arch
[New Thread 0x7ffff49c6700 (LWP 28238)]
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007ffff49c6700, prio -1, name 'Processor CP02' started
HHC00811I Processor CP02: architecture mode z/Arch
[New Thread 0x7ffff46c2700 (LWP 28239)]
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007ffff46c2700, prio -1, name 'Processor CP03' started
HHC00811I Processor CP03: architecture mode z/Arch
HHC02204I NUMCPU set to 4
HHC02204I MANUFACTURER set to IBM
HHC02204I ARCHLVL set to z/Arch
HHC00898W Facility( 044_PFPO ) *Enabled for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(3438)
HHC02204I ECPSVM set to disabled
HHC02204I LOADPARM set to
HHC02204I LPARNAME set to SYSZ01
HHC02204I LPARNUM set to 1
HHC02204I CPUIDFMT set to 0
HHC02204I PANTITLE set to z/VM 6.3 PHOENIX SYSRES 6300
HHC02204I SCPIMPLY set to ON
HHC01474I Using internal codepage conversion table default
HHC02204I DIAG8CMD set to ENABLE NOECHO
HHC02204I PANRATE set to SLOW
[New Thread 0x7ffff43af700 (LWP 28240)]
HHC00100I Thread id 00007ffff43af700, prio -1, name 'console_connect' started
HHC01024I Waiting for console connections on port 3270
HHC01250E 0:000C Card: error in function access(): No such file or directory
HHC00007I Previous message from function 'cardrdr_init_handler' at cardrdr.c(322)
HHC01463E 0:000C device initialization failed
HHC00007I Previous message from function 'attach_device' at config.c(1301)
[Detaching after fork from child process 28241]
HHC00901I 0:0F00 LCS: Interface tap0, type TAP opened
HHC00921I CTC: lcs device port 00: manual Multicast assist enabled
HHC00935I CTC: lcs device port 00: manual Checksum Offload enabled
[New Thread 0x7fffd77ca700 (LWP 28243)]
HHC01437I Config file[164] z390/etc/hercmini.cnf: including file /local/sys1/z390/etc/plxpex.cnf
HHC00414I 0:461A CKD file /local/sys1/s390/dasd/PLXPEA.461A: cyls 32760 heads 15 tracks 491400 trklen 56832
HHC00414I 0:461B CKD file /local/sys1/s390/dasd/PLXPEB.461B: cyls 32760 heads 15 tracks 491400 trklen 56832
HHC00414I 0:461C CKD file /local/sys1/s390/dasd/PLXPEC.461C: cyls 32760 heads 15 tracks 491400 trklen 56832
HHC00414I 0:461D CKD file /local/sys1/s390/dasd/PLXPED.461D: cyls 32760 heads 15 tracks 491400 trklen 56832
HHC01437I Config file[196] z390/etc/hercmini.cnf: including file /local/sys1/z390/etc/zvm630.cnf
HHC00414I 0:6300 CKD file /local/sys1/s390/mdasd/V631RS.6300: cyls 10017 heads 15 tracks 150255 trklen 56832
HHC00414I 0:6301 CKD file /local/sys1/s390/mdasd/V63RL1.6301: cyls 10017 heads 15 tracks 150255 trklen 56832
HHC00414I 0:6302 CKD file /local/sys1/s390/mdasd/V63CM1.6302: cyls 10017 heads 15 tracks 150255 trklen 56832
HHC00414I 0:6303 CKD file /local/sys1/s390/mdasd/V631S1.6303: cyls 10017 heads 15 tracks 150255 trklen 56832
HHC00414I 0:6304 CKD file /local/sys1/s390/mdasd/V631P1.6304: cyls 10017 heads 15 tracks 150255 trklen 56832
HHC00414I 0:6305 CKD file /local/sys1/s390/mdasd/V631W1.6305: cyls 10017 heads 15 tracks 150255 trklen 56832
HHC00414I 0:6306 CKD file /local/sys1/s390/mdasd/V631T1.6306: cyls 10017 heads 15 tracks 150255 trklen 56832
HHC00414I 0:6309 CKD file /local/sys1/s390/mdasd/V63SRC.6309: cyls 3339 heads 15 tracks 50085 trklen 56832
HHC00151I Activated facility: Message Security Assist +
HHC00151I Activated facility: Message Security Assist Extension 1, 2, 3 and 4
HHC17528I REXX(Regina) VERSION: REXX-Regina_3.9.1 5.00 5 Apr 2015
HHC17529I REXX(Regina) SOURCE: UNIX
HHC17525I REXX(Regina) Rexx has been started/enabled
HHC17500I REXX(Regina) Mode : Command
HHC17500I REXX(Regina) MsgLevel : Off
HHC17500I REXX(Regina) MsgPrefix : Off
HHC17500I REXX(Regina) ErrPrefix : Off
HHC17500I REXX(Regina) Resolver : On
HHC17500I REXX(Regina) SysPath ( 6) : On
HHC17500I REXX(Regina) RexxPath ( 0) :
HHC17500I REXX(Regina) Extensions ( 8) : .REXX:.rexx:.REX:.rex:.CMD:.cmd:.RX:.rx
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007ffff47c3700, prio -1, name 'timer_thread' started
HHC00100I Thread id 00007ffff48c4700, prio -1, name 'Processor CP00' started
HHC00811I Processor CP00: architecture mode z/Arch
HHC02204I CPUSERIAL set to 1BA2EF
HHC02204I CPUMODEL set to 2827
HHC02204I MODEL set to hardware(H20) capacity(H20) perm() temp()
HHC02204I PLANT set to 01
HHC17003I MAIN storage is 8G (mainsize); storage is not locked
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007ffff4ac7700, prio -1, name 'Processor CP01' started
HHC00811I Processor CP01: architecture mode z/Arch
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007ffff49c6700, prio -1, name 'Processor CP02' started
HHC00811I Processor CP02: architecture mode z/Arch
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007ffff46c2700, prio -1, name 'Processor CP03' started
HHC00811I Processor CP03: architecture mode z/Arch
HHC02204I NUMCPU set to 4
HHC02204I MANUFACTURER set to IBM
HHC02204I ARCHLVL set to z/Arch
HHC00898W Facility( 044_PFPO ) *Enabled for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(3438)
HHC02204I ECPSVM set to disabled
HHC02204I LOADPARM set to
HHC02204I LPARNAME set to SYSZ01
HHC02204I LPARNUM set to 1
HHC02204I CPUIDFMT set to 0
HHC02204I PANTITLE set to z/VM 6.3 PHOENIX SYSRES 6300
HHC02204I SCPIMPLY set to ON
HHC01474I Using internal codepage conversion table default
HHC02204I DIAG8CMD set to ENABLE NOECHO
HHC02204I PANRATE set to SLOW
HHC00100I Thread id 00007ffff43af700, prio -1, name 'console_connect' started
HHC01024I Waiting for console connections on port 3270
HHC01250E 0:000C Card: error in function access(): No such file or directory
HHC00007I Previous message from function 'cardrdr_init_handler' at cardrdr.c(322)
HHC01463E 0:000C device initialization failed
HHC00007I Previous message from function 'attach_device' at config.c(1301)
HHC00901I 0:0F00 LCS: Interface tap0, type TAP opened
HHC00921I CTC: lcs device port 00: manual Multicast assist enabled
HHC00935I CTC: lcs device port 00: manual Checksum Offload enabled
HHC01437I Config file[164] z390/etc/hercmini.cnf: including file /local/sys1/z390/etc/plxpex
HHC00414I 0:461A CKD file /local/sys1/s390/dasd/PLXPEA.461A: cyls 32760 heads 15 tracks 4914
HHC00414I 0:461B CKD file /local/sys1/s390/dasd/PLXPEB.461B: cyls 32760 heads 15 tracks 4914
HHC00414I 0:461C CKD file /local/sys1/s390/dasd/PLXPEC.461C: cyls 32760 heads 15 tracks 4914
HHC00414I 0:461D CKD file /local/sys1/s390/dasd/PLXPED.461D: cyls 32760 heads 15 tracks 4914
HHC01437I Config file[196] z390/etc/hercmini.cnf: including file /local/sys1/z390/etc/zvm630
HHC00414I 0:6300 CKD file /local/sys1/s390/mdasd/V631RS.6300: cyls 10017 heads 15 tracks 150
HHC00414I 0:6301 CKD file /local/sys1/s390/mdasd/V63RL1.6301: cyls 10017 heads 15 tracks 150
HHC00414I 0:6302 CKD file /local/sys1/s390/mdasd/V63CM1.6302: cyls 10017 heads 15 tracks 150
HHC00414I 0:6303 CKD file /local/sys1/s390/mdasd/V631S1.6303: cyls 10017 heads 15 tracks 150
HHC00414I 0:6304 CKD file /local/sys1/s390/mdasd/V631P1.6304: cyls 10017 heads 15 tracks 150
HHC00414I 0:6305 CKD file /local/sys1/s390/mdasd/V631W1.6305: cyls 10017 heads 15 tracks 150
HHC00414I 0:6306 CKD file /local/sys1/s390/mdasd/V631T1.6306: cyls 10017 heads 15 tracks 150
HHC00414I 0:6309 CKD file /local/sys1/s390/mdasd/V63SRC.6309: cyls 3339 heads 15 tracks 5008
Hercules started correctly. F00 is opened as TAP. F01 remains closed as probably no OS is started.
I enter quit:
HHC00101I Thread id 00007ffff48c4700, prio -1, name 'Processor CP00' ended
[Thread 0x7ffff48c4700 (LWP 28235) exited]
HHC00101I Thread id 00007ffff4ac7700, prio -1, name 'Processor CP01' ended
[Thread 0x7ffff4ac7700 (LWP 28237) exited]
HHC00101I Thread id 00007ffff49c6700, prio -1, name 'Processor CP02' ended
[Thread 0x7ffff49c6700 (LWP 28238) exited]
HHC00101I Thread id 00007ffff46c2700, prio -1, name 'Processor CP03' ended
[Thread 0x7ffff46c2700 (LWP 28239) exited]
Thread 10 "LCS_PortThread" received signal SIGUSR2, User defined signal 2.
[Switching to Thread 0x7fffd77ca700 (LWP 28243)]
0x00007ffff5be3f2c in close () from /lib64/libpthread.so.0
(gdb) backtrace
#0 0x00007ffff5be3f2c in close () from /lib64/libpthread.so.0
#1 0x00007fffd77cfa06 in LCS_PortThread (arg=arg@entry=0x7b9630) at ctc_lcs.c:2280
#2 0x00007ffff6dcae1d in hthread_func (arg2=0x7cea50) at hthreads.c:796
#3 0x00007ffff5bda559 in start_thread () from /lib64/libpthread.so.0
#4 0x00007ffff591181f in clone () from /lib64/libc.so.6
(gdb)
Starting an OS (z/VM) works and also the LCS work correctly. However making an orderly shutdown and quit leads to the SEGMENTATION FAULT. So there is no difference between starting or not starting an OS.
I did also a devlist F00 and devlist F01 before starting the OS. Both commands show that the device is open.
Restarting Hercules after the SEGMENTATION FAULT leads to the following error messages during Hercules startup:
HHC00138E Error setting TUN/TAP mode : Interrupted system call
HHC00007I Previous message from function 'TUNTAP_CreateInterface' at tuntap.c(269)
HHC00900E 0:0F00 LCS: Error in function TUNTAP_CreateInterface: Unknown error -1
HHC00007I Previous message from function 'LCS_Init' at ctc_lcs.c(344)
HHC01463E 0:0F01 device initialization failed
HHC00007I Previous message from function 'attach_device' at config.c(1301)
Exiting the user running Hercules and renewing the session (su - userid) works fine. The LCS devices are working again.
Config is now shirked to some DASDs in order to keep the log small.
I'm guessing "shirked to some DASDs" means you've simply removed some of your dasd devices from your Hercules configuration file.
Thread 10 "LCS_PortThread" received signal SIGUSR2, User defined signal 2. [Switching to Thread 0x7fffd77ca700 (LWP 28243)] 0x00007ffff5be3f2c in close () from /lib64/libpthread.so.0
I'm guessing the segmentation fault occurred at that point, yes?
(gdb) backtrace #0 0x00007ffff5be3f2c in close () from /lib64/libpthread.so.0 #1 0x00007fffd77cfa06 in LCS_PortThread (arg=arg@entry=0x7b9630) at ctc_lcs.c:2280 #2 0x00007ffff6dcae1d in hthread_func (arg2=0x7cea50) at hthreads.c:796 #3 0x00007ffff5bda559 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffff591181f in clone () from /lib64/libc.so.6 (gdb)
Hmmm... It looks to me like it's a bug in your system's tuntap device driver, not Hercules. Hercules is simply calling the close() function for the tuntap device, and for whatever reason, the call to close() is crashing. I don't think there's much we can do about that!
My suspicion that it's a bug in your system's tuntap device driver seems to be confirmed by your later comment:
Restarting Hercules after the SEGMENTATION FAULT leads to following error messages during start up of Hercules:
HHC00138E Error setting TUN/TAP mode : Interrupted system call HHC00007I Previous message from function 'TUNTAP_CreateInterface' at tuntap.c(269) HHC00900E 0:0F00 LCS: Error in function TUNTAP_CreateInterface: Unknown error -1 HHC00007I Previous message from function 'LCS_Init' at ctc_lcs.c(344) HHC01463E 0:0F01 device initialization failed HHC00007I Previous message from function 'attach_device' at config.c(1301)
Which indicates to me an obvious error/problem (i.e. "bug!") in your system's tuntap device driver. If Hercules is able to successfully open the tuntap device and set the mode during its previous attempts, but is unable to do the exact same thing on subsequent attempts (with the reported cause being "Interrupted system call" and "Unknown error -1"), then it sure appears to me as if your tuntap device is somehow borked (broken).
This is further confirmed by:
Exiting the user running Hercules and renewing the session (su - userid) works fine. The LCS devices are working again.
(where I'm presuming that: "exiting the user running Hercules and renewing the session" translates to: "I logged out of my userid (i.e. returned back to my system's login screen) and logged in again". Yes?)
Which would seem to indicate that doing so somehow managed to fix whatever problem there was with your tuntap device. (I'm guess the same fix would have occurred if you had rebooted your system too.) Because after you did that (logged out and then logged back in again), now your LCS devices are working again! (I presume that means Hercules is no longer crashing, yes? Please let me know if that's not true. Please let me know if the problem is still present, i.e. please let me know if Hercules is still crashing when you do a quit.)
Presuming I'm understanding you correctly, I'm going to mark this issue as "Unknown" (since it doesn't look like a Hercules bug) as well as "Close pending" until I hear back from you telling me whether my stated presumptions are correct or not.
(Very weird... Did you maybe apply some maintenance (system updates) that updated your tuntap driver and then forget to reboot? Or something similar? I know Linux is known for its stability and unnecessity to reboot so often, but maybe this is one instance where a reboot was required and you didn't do it? Hey! I'm just speculating! I'm not a Linux person!)
Well it seems really an issue with opensuse but stopping Hercules after a new start of the session leads again to the same error and it happens only with the LCS device. With OSA which I guess uses the same tun/tap the crash does not happen at all.
However it does also not happen with Hyperion 4.0.0 running on exactly this system. SDL Hyperion is installed in a different directory. The versions are also built on this system with the same options. So I can switch easily between them.
What I can try is to run Hercules to run as root to see if it is a privilege problem.
Currently I am installing z/VM 6.4 as second level system. This will take a day or so. I would like to see if it will IPL.
Well it seems really an issue with opensuse but stopping Hercules after a new start of the session leads again to the same error and it happens only with the LCS device.
Dang! :(
Okay, then there's obviously still some problem (some unknown bug) somewhere in SDL Hyperion's LCS handler that seems to only impact some Linux distributions (e.g. OpenSUSE in your case), so I'm re-opening this issue again.
With OSA which I guess uses the same tun/tap the crash does not happen at all.
OSA devices (QETH) use tuntap in 'tun' mode. LCS devices use tuntap in 'tap' mode. So the problem (the bug), whatever it is, is only in the LCS handler's tap handling logic.
However it does also not happen with Hyperion 4.0.0 running on exactly this system.
Well then that would seem to imply a bug was indeed introduced somewhere in SDL Hyperion's LCS handler. It might be with my new offloading code that queries the tuntap device to try and determine whether certain hardware "offloads" are possible or not (e.g. checksum offloading for example). That logic is likely what is triggering this new unexpected/undesirable behavior on certain systems such as yours. I'll have to call in @mcisho and/or @ivan-w for help.
Guys? HELP! :(
Sorry Fish, can't offer any help with this one. Using a Fedora 29 host, SDL & LCS works fine for z/VM 6.1 and z/OS 1.13 guests.
OSA devices (QETH) use tuntap in 'tun' mode.
Only with with layer 3, layer 2 uses tuntap in 'tap' mode.
Sorry Fish, can't offer any help with this one. Using a Fedora 29 host, SDL & LCS works fine for z/VM 6.1 and z/OS 1.13 guests.
Oh well. Thanks anyway.
I guess this means there must be something unusual about the way Florian's tuntap device is defined/configured. We need to somehow determine what that "unusualness" is so we can get it fixed.
But it's going to be rather difficult to debug if we're unable to reproduce the problem. :(
OSA devices (QETH) use tuntap in 'tun' mode.
Only with with layer 3, layer 2 uses tuntap in 'tap' mode.
(Oops!) You're right. I forgot about that. Thanks for reminding me.
Well, I will reconfigure the system to see if I can run it without LCS. That is probably the simplest solution.
I'm also running opensuse 15.0 and SDL Hercules version 4.2.0.0-SDL-g6cab259d-modified (4.2.0.0) and LCS is running fine. My TAP is configured with bridging.
However, I do get the segmentation fault every time I exit Hercules, which isn't an issue for me. I don't think it's related, but I can't run Hercules as superuser unless I run it from the hyperion folder, and LCS require superuser. BTW these releases of opensuse and Hercules mark the first time I could run CTCE between two opensuse machines.
However, I do get the segmentation fault every time I exit Hercules ...
Can you provide a gdb backtrace?
However, I do get the segmentation fault every time I exit Hercules ...
Can you provide a gdb backtrace?
Info: https://github.com/SDL-Hercules-390/hyperion/issues/163#issuecomment-449731456
I will give it a try tomorrow.
Backtrace of segmentation fault exiting Hercules. There were errors in the .cnf file which may be causing the problem.
Thread 9 "quit_thread" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff443e700 (LWP 6535)]
0x00007ffff5d8e87e in pthread_join () from /lib64/libpthread.so.0
(gdb) backtrace
#0 0x00007ffff5d8e87e in pthread_join () from /lib64/libpthread.so.0
#1 0x00007ffff6dcb00b in hthread_join_thread (tid=0, prc=prc@entry=0x0, location=location@entry=0x7ffff4a80fa2 "sockdev.c:62") at hthreads.c:826
#2 0x00007ffff4a7f666 in term_sockdev (arg=<optimized out>) at sockdev.c:62
#3 0x00007ffff6dc4767 in hdl_atexit () at hdl.c:683
#4 0x00007ffff7777567 in do_shutdown_now () at hscmisc.c:140
#5 0x00007ffff777ab74 in do_shutdown () at hscmisc.c:211
#6 0x00007ffff77581c5 in quit_thread (arg=arg@entry=0x0) at hsccmd.c:462
#7 0x00007ffff6dc9b9d in hthread_func (arg2=0x646fb0) at hthreads.c:796
#8 0x00007ffff5d8d559 in start_thread () from /lib64/libpthread.so.0
#9 0x00007ffff5ac481f in clone () from /lib64/libc.so.6
(gdb)
There were errors in the .cnf file which may be causing the problem.
What type of errors? May we see your Hercules configuration file and your Hercules log file?
@rgschmi Bob: contact me off list (privately) and I'll try to help you with your VS2017 problem. Resolving your VS2017 problem in this GitHub Issue is not the proper place for it. (And neither is the other thread either!)
Bob (@rgschmi) wrote:
Thread 9 "quit_thread" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff443e700 (LWP 6535)] 0x00007ffff5d8e87e in pthread_join () from /lib64/libpthread.so.0 (gdb) backtrace #0 0x00007ffff5d8e87e in pthread_join () from /lib64/libpthread.so.0 #1 0x00007ffff6dcb00b in hthread_join_thread (tid=0, prc=prc@entry=0x0, location=location@entry=0x7ffff4a80fa2 "sockdev.c:62") at hthreads.c:826 #2 0x00007ffff4a7f666 in term_sockdev (arg=<optimized out>) at sockdev.c:62 ... <snipped> ...
Interesting!
I hadn't noticed this before, but it appears the crash (SIGSEGV
), at least for you, Bob, is occurring in sockdev.c's pthread_join()
call, not in LCS code!
The gdb backtrace that Florian (@fbi-ranger) provided was for the SIGUSR2
signal that the LCS_PortThread
receives as part of its normal close processing:
Thus I am inclined to believe the backtrace
that Florian provided is actually an unintended "red herring" (i.e. false lead, i.e. misleading clue), and that the problem might actually be in our sockdev code, and not our LCS code as originally believed. That is to say, the sockdev bug, whatever it is, only happened to also impact LCS code too for some as-yet-unknown networking reason.
Maybe...
I'm not sure...
I'm just guessing at this point!
Florian? (@fbi-ranger) Are you also maybe using a sockdev device too like Bob obviously is? A socket printer perhaps? If you are, then that would lend weight to my theory. Please let us know whether you are also using a sockdev device or not?
In the mean time, Bob? (@rgschmi) Can you do me a favor and try again without any sockdev devices in your configuration? If it works (if no crash occurs), then that too would also lend weight to my theory.
Thanks!
Fish, No, sorry I do not have any other sockdev devices in my configuration.
For me this behavior looks like that a "subtask" (thread) compromises some internal control blocks and then the "close" of the LCS can not complete any more because its has a broken control structure. But this dates back from my old days of programmer having a crashing CICS program in front of me before we had Storage protection in CICS. ;-)
In my configuration the segfault happened only when LCS devices are part of the config.
Fish, No, sorry I do not have any other sockdev devices in my configuration.
Dang. I thought maybe I was onto something. Oh well. :(
Can you please do another another gdb backtrace? The first one you provided was for a SIGUSR2 signal, which doesn't help. The SIGUSR2 is normal and doesn't tell us anything. I need to see the backtrace for the SIGSEGV
, like what Bob provided.
I'm not familiar with gdb, but there is probably some way to tell gdb to "please ignore this signal and continue" whenever the SIGUSR2 occurs, which should hopefully lead to the eventual SIGSEGV
, which is the event we need to see the backtrace for.
Thanks.
I'm not familiar with gdb, but there is probably some way to tell gdb to "please ignore this signal and continue" whenever the SIGUSR2 occurs...
FYI: I found the following:
It appears you can do either 1 or 2 (or both):
(or both)
I hope that helps!
(I want to see if your SIGSEGV backtrace is the same as Bob's)
Fish, Thanks for helping me in gdb. Here is what's happening when Hercules started and immediately finish it entering quit:
HC00414I 0:6500 CKD file /local/sys1/s390/dasd/NBCSA1.6500: cyls 10017 heads 15 tracks 1502
HHC00414I 0:6501 CKD file /local/sys1/s390/dasd/NBCSA2.6501: cyls 10017 heads 15 tracks 1502
HHC00414I 0:6502 CKD file /local/sys1/s390/dasd/NBCSA3.6502: cyls 10017 heads 15 tracks 1502
HHC00414I 0:6503 CKD file /local/sys1/s390/dasd/NBCSA4.6503: cyls 10017 heads 15 tracks 1502
HHC00414I 0:6506 CKD file /local/sys1/s390/dasd/NBCC01.6506: cyls 1113 heads 15 tracks 16695
HHC00414I 0:6507 CKD file /local/sys1/s390/dasd/NBCC02.6507: cyls 1113 heads 15 tracks 16695
HHC00414I 0:6508 CKD file /local/sys1/s390/dasd/NBCC03.6508: cyls 1113 heads 15 tracks 16695
HHC00414I 0:650A CKD file /local/sys1/s390/dasd/NMCT01.650A: cyls 1113 heads 15 tracks 16695
HHC00414I 0:650B CKD file /local/sys1/s390/dasd/NMCT02.650B: cyls 1113 heads 15 tracks 16695
HHC00414I 0:650C CKD file /local/sys1/s390/dasd/NMCT03.650C: cyls 1113 heads 15 tracks 16695
HHC00414I 0:6510 CKD file /local/sys1/s390/dasd/NBCC10.6510: cyls 10017 heads 15 tracks 1502
HHC00414I 0:6511 CKD file /local/sys1/s390/dasd/NBCC11.6511: cyls 10017 heads 15 tracks 1502
HHC00414I 0:651C CKD file /local/sys1/s390/dasd/NLXC04.651C: cyls 10017 heads 15 tracks 1502
HHC00414I 0:651D CKD file /local/sys1/s390/dasd/NLXC03.651D: cyls 10017 heads 15 tracks 1502
HHC00414I 0:651E CKD file /local/sys1/s390/dasd/NLXC02.651E: cyls 10017 heads 15 tracks 1502
HHC00414I 0:651F CKD file /local/sys1/s390/dasd/NLXC01.651F: cyls 10017 heads 15 tracks 1502
HHC00414I 0:6600 CKD file /local/sys1/s390/dasd/NSMS01.6600: cyls 30051 heads 15 tracks 4507
HHC00414I 0:6601 CKD file /local/sys1/s390/dasd/NSMS02.6601: cyls 30051 heads 15 tracks 4507
HHC01603I quit
HHC00101I Thread id 00007ffff45c1700, prio -1, name 'http_server' ended
[Thread 0x7ffff45c1700 (LWP 12206) exited]
HHC00101I Thread id 00007ffff48c4700, prio -1, name 'Processor CP00' ended
[Thread 0x7ffff48c4700 (LWP 12201) exited]
HHC00101I Thread id 00007ffff4ac7700, prio -1, name 'Processor CP01' ended
[Thread 0x7ffff4ac7700 (LWP 12203) exited]
HHC00101I Thread id 00007ffff49c6700, prio -1, name 'Processor CP02' ended
[Thread 0x7ffff49c6700 (LWP 12204) exited]
HHC00101I Thread id 00007ffff46c2700, prio -1, name 'Processor CP03' ended
[Thread 0x7ffff46c2700 (LWP 12205) exited]
**Thread 11 "LCS_PortThread" received signal SIGSEGV, Segmentation fault.**
[Switching to Thread 0x7ffff41ad700 (LWP 12210)]
0x00007ffff4e5ff6f in ?? () from /usr/lib64/libregina.so
(gdb) backtrace
#0 0x00007ffff4e5ff6f in ?? () from /usr/lib64/libregina.so
#1 0x00007ffff4e5ddcd in ?? () from /usr/lib64/libregina.so
#2 0x00007ffff4e1801d in ?? () from /usr/lib64/libregina.so
#3 <signal handler called>
#4 0x00007ffff5be3f2c in close () from /lib64/libpthread.so.0
#5 0x00007fffd75cba06 in LCS_PortThread (arg=arg@entry=0x83b380) at ctc_lcs.c:2280
#6 0x00007ffff6dcad2d in hthread_func (arg2=0x850610) at hthreads.c:796
#7 0x00007ffff5bda559 in start_thread () from /lib64/libpthread.so.0
#8 0x00007ffff591181f in clone () from /lib64/libc.so.6
(gdb)
Hope this helps you further.
Interesting!
**Thread 11 "LCS_PortThread" received signal SIGSEGV, Segmentation fault.** [Switching to Thread 0x7ffff41ad700 (LWP 12210)] 0x00007ffff4e5ff6f in ?? () from /usr/lib64/libregina.so (gdb) backtrace #0 0x00007ffff4e5ff6f in ?? () from /usr/lib64/libregina.so #1 0x00007ffff4e5ddcd in ?? () from /usr/lib64/libregina.so #2 0x00007ffff4e1801d in ?? () from /usr/lib64/libregina.so #3 <signal handler called> #4 0x00007ffff5be3f2c in close () from /lib64/libpthread.so.0
It appears that for some unknown reason Regina REXX is crashing!
May I see the beginning of your Hercules logfile where Rexx is being loaded? E.g. It's the part of the logfile that looks similar to the following:
HHC17528I REXX(OORexx) VERSION: REXX-ooRexx_4.2.0(MT)_64-bit 6.04 22 Feb 2014
HHC17529I REXX(OORexx) SOURCE: WindowsNT
HHC17525I REXX(OORexx) Rexx has been started/enabled
HHC17500I REXX(OORexx) Mode : Subroutine
HHC17500I REXX(OORexx) MsgLevel : Off
HHC17500I REXX(OORexx) MsgPrefix : Off
HHC17500I REXX(OORexx) ErrPrefix : Off
HHC17500I REXX(OORexx) Resolver : On
HHC17500I REXX(OORexx) SysPath (46) : On
HHC17500I REXX(OORexx) RexxPath ( 0) :
HHC17500I REXX(OORexx) Extensions ( 8) : .REXX;.rexx;.REX;.rex;.CMD;.cmd;.RX;.rx
Some things to try:
Before starting Hercules, define the environment variable HREXX_PACKAGE
and set it to the value none
. This should prevent Rexx from being loaded. Does the crash still occur?
Try installing ooRexx instead of Regina Rexx, and set HREXX_PACKAGE
to ooRexx
. Does the crash still occur?
Thanks!
2. ... ooRexx instead of Regina Rexx ...
Or ... in addition to Regina rexx.
That is to say, if you don't wish to uninstall Regina, you can still install ooRexx too (i.e. you can have both Rexxes installed at the same time and tell Hercules which one you want to use at runtime. Refer to the README.REXX document).
Fish,
I've tried to reproduce the segmentation fault, with and without sockdev devices to no avail. I am currently running OpenSUSE 15.0 and Hercules version 4.2.0.0-SDL-gcddb23fc-modified (4.2.0.0). As I mentioned in an email to you, I was NOT running an 'official' version of Hercules when I had the segmentation fault. I tried a formal shutdown of z/OS, a quiesce, and exiting Hercules with z/OS running, all with a printer and a 3390 connected via sockdev to no avail.
... to no avail.
By "to no avail" I take it to mean you were unable to recreate the crash, correct? That is to say, your system (OpenSUSE 15.0, the same as what Florian is running) does not crash, regardless of whether you have an LCS device in your configuration or not and regardless of whether you have any sockdev devices or not, correct? In other words, you system always runs just fine, yes?
Bob, do you have Regina Rexx installed on your system? If not, can you (temporarily?) install it and see whether that makes any difference or not? Florian has Regina Rexx installed on his system and it appears that's where the crash is occurring. I'm trying to determine (confirm or deny) whether or not it's Regina Rexx that is causing the crash. Thanks!
@fbi-ranger , @rgschmi (Florian and Bob)
It appears both of you are running fairly old versions of SDL Hyperion 4.2.
It would be very helpful if both of you would do a git pull to pick up the latest and greatest version and try again. I want to make sure I (we) haven't been wasting time chasing a non-existent bug!
Fish,
Here are the the REXX initialization messages:
HHC17528I REXX(Regina) VERSION: REXX-Regina_3.9.1 5.00 5 Apr 2015
HHC17529I REXX(Regina) SOURCE: UNIX
HHC17525I REXX(Regina) Rexx has been started/enabled
HHC17500I REXX(Regina) Mode : Command
HHC17500I REXX(Regina) MsgLevel : Off
HHC17500I REXX(Regina) MsgPrefix : Off
HHC17500I REXX(Regina) ErrPrefix : Off
HHC17500I REXX(Regina) Resolver : On
HHC17500I REXX(Regina) SysPath ( 6) : On
HHC17500I REXX(Regina) RexxPath ( 0) :
HHC17500I REXX(Regina) Extensions ( 8) : .REXX:.rexx:.REX:.rex:.CMD:.cmd:.RX:.rx
Setting HREXX_PACKAGE=none
does not help at all. The crash still occurs:
Thread 11 "LCS_PortThread" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff41a8700 (LWP 3036)]
0x00007ffff4e5af6f in ?? () from /usr/lib64/libregina.so
(gdb) backtrace
#0 0x00007ffff4e5af6f in ?? () from /usr/lib64/libregina.so
#1 0x00007ffff4e58dcd in ?? () from /usr/lib64/libregina.so
#2 0x00007ffff4e1301d in ?? () from /usr/lib64/libregina.so
#3 <signal handler called>
#4 0x00007ffff5bdef2c in close () from /lib64/libpthread.so.0
#5 0x00007fffd75cab86 in LCS_PortThread (arg=0x843380) at ctc_lcs.c:2280
#6 0x00007ffff6dadd32 in hthread_func (arg2=0x869890) at hthreads.c:797
#7 0x00007ffff5bd5559 in start_thread () from /lib64/libpthread.so.0
#8 0x00007ffff590c81f in clone () from /lib64/libc.so.6
Since the ./configure
'--disable-regina-rexx'
option is not working, I had to uninstall Regina REXX:
HC00109E set_thread_priority( 5 ) failed: Operation not permitted
HHC00007I Previous message from function 'impl' at impl.c(848)
HHC00110W Defaulting all threads to priority 1
HHC00007I Previous message from function 'impl' at impl.c(851)
HHC00100I Thread id 00007fd5e17aa740, prio -1, name 'impl_thread' started
HHC00100I Thread id 00007fd5deff8700, prio -1, name 'logger_thread' started
HHC01413I Hercules version 4.2.0.0-SDL-g0f1b54b5-modified (4.2.0.0)
HHC01414I (C) Copyright 1999-2019 by Roger Bowler, Jan Jaeger, and others
HHC01417I YBI-15007-9623
HHC01415I Build date: Mar 31 2019 at 15:44:48
HHC01417I Built with: GCC 7.3.1 20180323 [gcc-7-branch revision 258812]
HHC01417I Build type: GNU/Linux x86_64 host architecture build
HHC01417I Modes: S/370 ESA/390 z/Arch
HHC01417I Max CPU Engines: 12
HHC01417I Using shared libraries
HHC01417I Using setresuid() for setting privileges
HHC01417I Using POSIX threads Threading Model
HHC01417I Using Error-Checking Mutex Locking Model
HHC01417I With Shared Devices support
HHC01417I With Dynamic loading support
HHC01417I With External GUI support
HHC01417I With IPV6 support
HHC01417I With HTTP Server support
HHC01417I With sqrtl support
HHC01417I With SIGABEND handler
HHC01417I With CCKD BZIP2 support
HHC01417I With HET BZIP2 support
HHC01417I With ZLIB support
HHC01417I With Regular Expressions support
**HHC01417I Without Object REXX support
HHC01417I Without Regina REXX support**
HHC01417I With Automatic Operator support
HHC01417I Without National Language Support
HHC01417I With CCKD64 Support
HHC01417I Machine dependent assists: cmpxchg1 cmpxchg4 cmpxchg8 hatomics=C11
HHC01417I Running on: hercules (Linux-4.12.14-lp150.12.48-default x86_64) MP=8
HHC01417I Built with crypto external package version 1.0.0.27-ga3e07b5
HHC01417I Built with decNumber external package version 3.68.0.80-gdb5c456
HHC01417I Built with SoftFloat external package version 3.5.0.83-g3da230f
HHC01417I Built with telnet external package version 1.0.0.42-gcaec0ac
HHC00018W Hercules is NOT running in elevated mode
HHC00007I Previous message from function 'impl' at impl.c(906)
HHC00150I Crypto module loaded (C) Copyright 2003-2016 by Bernard van der Helm
HHC00151I Activated facility: Message Security Assist
HHC00151I Activated facility: Message Security Assist Extension 1, 2, 3 and 4
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007fd5de5b6700, prio -1, name 'Processor CP00' started
HHC00811I Processor CP00: architecture mode z/Arch
HHC00100I Thread id 00007fd5de4b5700, prio -1, name 'timer_thread' started
HHC02204I CPUSERIAL set to 1BA2EF
HHC02204I CPUMODEL set to 2827
HHC02204I MODEL set to hardware(H20) capacity(H20) perm() temp()
HHC02204I PLANT set to 01
HHC17003I MAIN storage is 8G (mainsize); storage is not locked
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007fd5de7b9700, prio -1, name 'Processor CP01' started
HHC00811I Processor CP01: architecture mode z/Arch
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007fd5de6b8700, prio -1, name 'Processor CP02' started
HHC00811I Processor CP02: architecture mode z/Arch
HHC00111I Thread CPU Time IS available (_POSIX_THREAD_CPUTIME=0)
HHC00100I Thread id 00007fd5de3b4700, prio -1, name 'Processor CP03' started
HHC00811I Processor CP03: architecture mode z/Arch
HHC02204I NUMCPU set to 4
HHC02204I MANUFACTURER set to IBM
HHC02204I ARCHLVL set to z/Arch
HHC02204I ECPSVM set to disabled
HHC02204I LOADPARM set to
HHC02204I LOADPARM set to +
HHC02204I LPARNAME set to SYSZ01
HHC02204I LPARNUM set to 1
HHC02204I CPUIDFMT set to 0
HHC02204I PANTITLE set to z/VM 6.3 PHOENIX SYSRES 6300
HHC02204I SCPIMPLY set to ON
HHC01474I Using internal codepage conversion table default
HHC02204I DIAG8CMD set to ENABLE NOECHO
HHC02204I PORT set to port=8081 auth userid<herc> password<do1t>
HHC01807I HTTP server signaled to start
HHC02204I PANRATE set to SLOW
HHC00100I Thread id 00007fd5de2b3700, prio -1, name 'http_server' started
HHC01802I HTTP server using root directory /local/sys1/z390/herc15007/share/hercules/
HHC01803I HTTP server waiting for requests on port 8081
HHC00100I Thread id 00007fd5ddfa0700, prio -1, name 'console_connect' started
HHC01024I Waiting for console connections on port 3270
HHC01250E 0:000C Card: error in function access(): No such file or directory
HHC00007I Previous message from function 'cardrdr_init_handler' at cardrdr.c(322)
HHC01463E 0:000C device initialization failed
HHC00007I Previous message from function 'attach_device' at config.c(1301)
HHC00901I 0:0F02 LCS: Interface tap0, type TAP opened
HHC00921I CTC: lcs device port 00: manual Multicast assist enabled
HHC00935I CTC: lcs device port 00: manual Checksum Offload enabled
HHC00224I 0:0760 Tape file *, type aws: display " "
HHC00224I 0:0761 Tape file *, type aws: display " "
HHC00224I 0:0D00 Tape file *, type aws: display " "
HHC00224I 0:0D01 Tape file *, type aws: display " "
HHC00224I 0:0D80 Tape file *, type aws: display " "
HHC00224I 0:0D81 Tape file *, type aws: display " "
The good news is that without Regina REXX installed, the Segment Fault does not happen anymore!
However, it is still a problem whenever Regina Rexx is installed. I don't understand what has Rexx to do with LCS / CTC support? I never used Rexx together with Hercules yet, so only the libraries were linked together.
Setting
HREXX_PACKAGE=none
does not help at all. The crash still occurs:
Dang! :(
Since the
./configure
'--disable-regina-rexx'
option is not working ...
Wow. I wasn't aware of that! That's definitely a bug. I'll try to get that fixed for you right away!
... I had to uninstall Regina REXX.
The good news is that without Regina REXX installed, the Segment Fault does not happen anymore!
Which is proof that Regina Rexx is definitely the cause of the problem! WHY, I haven't a clue.
However, it is still a problem whenever Regina Rexx is installed.
Understood. I can't remember having any problems myself when I was testing Hercules Regina Rexx support on my CentOS 6.10 VMware virtual machine, but I might not have tried it with an LCS device defined. I'll have to try it again.
I don't understand what has Rexx to do with LCS / CTC support?
I don't understand it either! It's very weird! It doesn't make any sense!
But during all of my Hercules REXX testing I had nothing but problems with Regina Rexx. It's a POS in my opinion. It's very buggy and very poorly documented. OORexx (Open Object Rexx) on the other hand, is much more stable and much better documented as well. It's just a much better product in my opinion, and is what I have installed on both my Windows host system as well as on both of my CentOS and Macintosh virtual machines too.
I am seriously considering dropping support for Regina Rexx altogether at this point! >8-<
IN SUMMARY: hang loose for a day or two(?) while I try to fix the '--disable-regina-rexx'
configure bug. I'll let you know when the fix is commited so you can then build your Hercules without Regina rexx support from now on.
p.s. Have you tried installing ooRexx yet?
I have Regina Rexx (and ooRexx) installed, and I have not experienced the Segment Fault. Also, ./config --disable-regina-rexx (and --disable-object-rexx) work fine for me.
I have Regina Rexx (and ooRexx) installed, and I have not experienced the Segment Fault. Also, ./config --disable-regina-rexx (and --disable-object-rexx) work fine for me.
Thanks for that report, Ian! As I mentioned in my previous comment I too had both installed (as well as only one or the other too) during my testing and don't recall experiencing any crashes. But then I can't remember whether any of the tests I did were done with an LCS device in my configuration either, so I guess that doesn't mean much.
The problem may well be limited to openSuse 15 however. I don't know. I haven't heard back from Bob yet who I believe is also running openSuse 15.
What distro are you using, Ian?
I'm using Fedora 29.
I have the opposite view of the Rexxes. I much prefer Regina, it's closer to Cowlishaw's vision, and isn't full of weird extensions. And I find the Regina manual easy to use and follow. Admittedly, I've never tried using Regina (or ooRexx) with Hercules, haven't had any need for either.
In one of the traces wasn't there a signal between a close and Regina becoming involved? LCS does do a SIGUSR2, perhaps it was being intercepted by Regina's signal handlers?
EDIT: This comment is BOGUS! (Doh!)
Version 0f1b54b5... is David Durand's pull request that I merged which I hadn't pulled into my local repository yet! (Doh!)
Version 0f1b54b5... is the most current version!
My bad! Sorry! :(
@fbi-ranger
FLORIAN!
IMPORTANT!
The version of SDL Hyperion 4.2 that you are using IS BOGUS!!
According to your Hercules logfile that you posted:
HHC01413I Hercules version 4.2.0.0-SDL-g0f1b54b5-modified (4.2.0.0)
you are using a BOGUS/UNKNOWN version!!
The git hash "0f1b54b5..." DOES NOT EXIST anywhere in the official SDL Hyperion 4.2 repository's commit history! I have no idea where you got your version of Hercules from, but it is bad! (bogus!)
Please delete your SDL Hercules installation and clone the official SDL Hyperion version 4.2 from GitHub:
and then rebuild and try your test again!
I much prefer Regina, it's closer to Cowlishaw's vision, and isn't full of weird extensions. And I find the Regina manual easy to use and follow.
To each their own. :)
In one of the traces wasn't there a signal between a close and Regina becoming involved?
Yes, I saw that too.
LCS does do a SIGUSR2, perhaps it was being intercepted by Regina's signal handlers?
It wouldn't surpise me in the least!
However... the SIGUSR2 signal should be being (consumed?) (ignored?) in our signal handling function sigabend_handler
, so it shouldn't be being passed on to Regina, yes?
(I'm not very experienced with, nor knowledgable about, Unix signal handling!)
The sigabend_handler
function just returns whenever SIGUSR2
is received, which ignores (consumes?) the signal, yes? In order to "pass it on" you need to do:
signal( signo, SIG_DFL );
raise( signo );
which sigabend_handler
is not doing for SIGUSR2
. So if I'm understanding Unix signal handling correctly, Regina should not even be receiving the signal at all! Yes? Why the gdb backtrace shows it I don't know. I'm not a Linux person.
I've just installed rexx (Regina because I've used it in Windows), and will git and build the latest Hercules. I've not used rexx with Hercules yet, though it's on my to-do list. Is there a simple rexx test I can run for you?
I've installed Hercules version 4.2.0.0-SDL-g0f1b54b5-modified (4.2.0.0) and am getting seg faults every time I exit Hercules, with or without sockdev devices or without even starting a guest z/OS.
I do have Regina rexx enabled, but didn't exec any rexx scripts.
It fails even without LCS devices defined.
This was not happening with the previous version of Hercules. I was unable to get a segfault with anything I tried, with or without sockdev.
HHC01413I Hercules version 4.2.0.0-SDL-g0f1b54b5-modified (4.2.0.0)
HHC01414I (C) Copyright 1999-2019 by Roger Bowler, Jan Jaeger, and others
HHC01417I ** The SoftDevLabs version of Hercules **
HHC01415I Build date: Mar 31 2019 at 15:29:10
HHC01417I Built with: GCC 7.3.1 20180323 [gcc-7-branch revision 258812]
HHC01417I Build type: GNU/Linux x86_64 host architecture build
HHC01417I Modes: S/370 ESA/390 z/Arch
HHC01417I Max CPU Engines: 64
HHC01417I Using shared libraries
HHC01417I Using setresuid() for setting privileges
HHC01417I Using POSIX threads Threading Model
HHC01417I Using Error-Checking Mutex Locking Model
HHC01417I With Shared Devices support
HHC01417I With Dynamic loading support
HHC01417I With External GUI support
HHC01417I With IPV6 support
HHC01417I With HTTP Server support
HHC01417I With sqrtl support
HHC01417I With SIGABEND handler
HHC01417I Without CCKD BZIP2 support
HHC01417I Without HET BZIP2 support
HHC01417I With ZLIB support
HHC01417I With Regular Expressions support
HHC01417I Without Object REXX support
HHC01417I With Regina REXX support
HHC01417I With Automatic Operator support
HHC01417I Without National Language Support
HHC01417I With CCKD64 Support
HHC01417I Machine dependent assists: cmpxchg1 cmpxchg4 cmpxchg8 hatomics=C11
HHC01417I Running on: Suse1 (Linux-4.12.14-lp150.12.45-default x86_64) MP=2
HHC01417I Built with crypto external package version 1.0.0.27-ga3e07b5
HHC01417I Built with decNumber external package version 3.68.0.80-gdb5c456
HHC01417I Built with SoftFloat external package version 3.5.0.83-g3da230f
HHC01417I Built with telnet external package version 1.0.0.42-gcaec0ac
HHC01603I exit
Segmentation fault (core dumped)
root@Suse1:/home/rgschmi>
@fbi-ranger
Florian, please IGNORE my previous comment. The version you are using (0f1b54b5...) is indeed the correct version. I apologize for the confusion! :(
But during all of my Hercules REXX testing I had nothing but problems with Regina Rexx. It's a POS in my opinion. It's very buggy and very poorly documented. OORexx (Open Object Rexx) on the other hand, is much more stable and much better documented as well.
I have the opposite view of the Rexxes. I much prefer Regina, it's closer to Cowlishaw's vision, and isn't full of weird extensions. And I find the Regina manual easy to use and follow.
FYI: ooRexx is IBM's Rexx, whereas Regina Rexx is not:
And:
[fish@centos-64 ~]$ rexx -v
Open Object Rexx Version 4.2.0
Build date: Dec 31 2013
Addressing Mode: 64
Copyright (c) IBM Corporation 1995, 2004.
Copyright (c) RexxLA 2005-2013.
All Rights Reserved.
This program and the accompanying materials are made available under
the terms of the Common Public License v1.0 which accompanies this
distribution or at
http://www.oorexx.org/license.html
[fish@centos-64 ~]$
Regina Rexx is maintained by Mark Hessling, not IBM.
I personally have had nothing but problems with Regina wherease I have hardly had any problems at all with ooRexx. It just works. (whereas Regina Rexx frequently doesn't!)
@rgschmi @fbi-ranger
Bob: Florian:
How did you "install" Regina?
I seem to recall that you need to install the Regina-REXX-lib rpm first, then the Regina-REXX-devel rpm, and finally the Regina-REXX rpm last. (And then build Hercules.)
Also, does Hercules behave any differently when you start rexx manually before attempting to start Hercules for the first time? That is to say, after logging on to your Linux session (userid), enter the command rexx -v
(or rexx --version
?) and then, afterwards, try starting Hercules for the first time.
I seem to recall Regina (as well as ooRexx too?) comes with a daemon that must be running before rexx will behave properly, and the daemon is not started until you run rexx for the first time.
I have the command rexx -v
in my bash profile so the daemon is automatically started every time I logon.
Under openSUSE you install normally via YAST, which means, you do not decide which rpm is chosen first as the sequence depends on the rpm install list (dependencies).
To what I understand is the daemon used to get access to rexx queues and at least AFAIK is automatically integrated in the startup procedures. I don't know it this has any other function but the rxqueue(s).
It is registered in the service manager and normally starts automatically (without any rexx invocation)
ps -ef | grep rx
root 26289 1 0 18:26 ? 00:00:00 /usr/bin/rxstack -d
# rexx -v
rexx: REXX-Regina_3.9.1 5.00 5 Apr 2015 (64 bit)
The reason I use Regina REXX is that it has a convenient way of putting outputs from system commands to a queue, which ooRexx does not have or at least didn't have when I was trying to play with it many years ago.
I agree with Ian that Regina REXX is more close to the original REXX. As I am coming from the mainframe, I wanted as much as possible the same functionality as under VM/CMS and not having weird constructions via temporary files etc.
Maybe this has meanwhile changed in ooREXX, I don't know. I use under LINUX now PERL instead of REXX. Therefore I could easily relinquish it, when it solves the LCS problem. However this is surely not an acceptable solution.
Regarding invocation of REXX libraries: Couldn't it be the problem that all libraries are statically linked during link phase of Hercules install process? Maybe that is why even I do not use REXX together with Hercules they are loaded and active?
Under openSUSE you install normally via YAST, which means, you do not decide which rpm is chosen first as the sequence depends on the rpm install list (dependencies).
As you know, I do not know a lot about Linux. I am relatively inexpeienced. But as far as I know, a default install of rexx only installs the components necessary to use rexx, i.e. it only installs the components to be able to run (execute) rexx scripts.
But as far as I know a default install does not install the components needed to do rexx development, i.e. it does not install the components needed to be able to write programs that call directly into rexx internal functions, etc, i.e. it does not install the components needed to link your program with rexx itself, so your program can execute rexx scripts by directly calling into internal rexx functions like the way Hercules does. (Hercules does not fork a separate process to run rexx scripts. It calls internal rexx functions to ask it to "please execute this script" and passes statements to it, etc.) To do rexx development like the way Hercules needs to do, you have to install the "lib" and "devel" packages. Hercules does not call rexx externally. Instead, rexx support is integrated directly into Hercules. Thus the need for the "lib" and "devel" packages.
The reason I use Regina REXX is that it has a convenient way of putting outputs from system commands to a queue, which ooRexx does not have or at least didn't have when I was trying to play with it many years ago.
Is placing the output from system commands into a queue something that the rexx language supports? Is doing that part of the language? Is doing that part of standard rexx? If the answer is yes, then I'm sure ooRexx supports it! And I'm sure it supports it in the standard language-defined way too!
The rexx language is well defined, defining exactly how each command, statement, function is supposed to behave. If placing the output of system commands into a queue is something that the rexx language defines (if doing that is something supported by the rexx language), then any product claiming to be a rexx interpreter must obviously support it (and support it in the defined manner) or else that product cannot be called a valid rexx language interpreter!
Given that ooRexx is written and copyrighted by IBM themselves, I personally would trust ooRexx more than I would Regina! (which is not an official IBM product and thus far less likely to conform to the original Rexx language which was invented by IBM).
I would personally trust IBM themselves to write a rexx interpreter that conformed to the original Cowlishaw Rexx language (who is an IBM Fellow and who invented Rexx while working at IBM!) than I would someone like Anders Christensen or Mark Hessling, who as far as I know did not work at IBM.
I agree with Ian that Regina REXX is more close to the original REXX.
How can a product not written by IBM be "closer to the original REXX" than a product that was written by IBM? (especially given that the "original REXX" was something that was written by IBM!)
How can a non-IBM product conform better to an IBM product better than IBM's own product?! That doesn't make any sense!
As I am coming from the mainframe, I wanted as much as possible the same functionality as under VM/CMS and not having weird constructions via temporary files etc.
I am not familiar with these "weird constructions via temporary files" that you mention. What are you talking about?
And as far as I know, both Regina and ooRexx both "provide the same functionality as under VM/CMS".
Is there something that ooRexx does that is not the same as under VM/CMS?? I doubt it! I am very confident that ooRexx -- an IBM product! -- provides the same functionality as under VM/CMS!
Regarding invocation of REXX libraries: Couldn't it be the problem that all libraries are statically linked during link phase of Hercules install process?
Rexx support is linked into Hercules by default, yes, but only if the needed headers (and libraries?) are found.
Maybe that is why even I do not use REXX together with Hercules they are loaded and active?
If you have Rexx installed (Regina OR ooRexx (or both!)) then Hercules will build itself with Rexx support.
Perhaps our default should be to NOT provide Rexx support by default? I.e. perhaps Hercules rexx support should only be provided by specific request? (e.g. via a --ENABLE-regina-rexx
configure option?) That is perhaps something we could discuss elsewhere.
For the purpose of this issue however (which is trying to fix your crash when LCS devices are used when Regina Rexx is installed), have you tried uninstalling Regina, and then manually installing ALL of the Regina packages, including the "lib" and "devel" packages too? (which I believe must be installed first, then the normal/default Regina package last).
Maybe if you do that it won't crash any more? Maybe? I don't know. It's just wishful thinking.
Personally I believe Regina has either a race condition or, more likely, is erroneously (incorrectly) receiving and processing Hercules's SIGUSR2
signal, and that is what is causing it to crash: a poorly (improperly) written non-IBM rexx interpreter.
@rgschmi wrote:
I've installed Hercules version 4.2.0.0-SDL-g0f1b54b5-modified (4.2.0.0) and am getting seg faults every time I exit Hercules, with or without sockdev devices or without even starting a guest z/OS.
I do have Regina rexx enabled, but didn't exec any rexx scripts.
It fails even without LCS devices defined.
This was not happening with the previous version of Hercules. I was unable to get a segfault with anything I tried, with or without sockdev.
Bob:
Have you tried:
(and/or)
./configure
option: --disable-regina-rexx
to prevent Hercules from trying to use Regina/rexx?Either of those techniques should prevent Hercules from crashing if Regina is indeed the culprit (which it appears it is).
I do have Regina rexx enabled, but didn't exec any rexx scripts.
Does not matter. If Regina is installed, Hercules, by default, will be built with rexx support. You do not have to actually use Hercules's rexx support, but it will be there (which is apparently the problem) (at least with Regina anyway; ooRexx AFAIK doesn't have this problem).
FYI to those who still prefer Regina over IBM's own ooRexx product (in case you missed it):
(Excerpt):
... it's definitely Hercules's logger_init redirection logic that is confusing poor Regina. OORexx works flawlessly, with or without redirection, but Regina unfortunately doesn't.
Regina sucks!
ooRexx rocks!
(IMHO)
More evidence of Regina's bugginess/inferiority:
Trying to migrate to SDL Hercules 4.2, I face the problem that ending Hercules by entering 'quit' on the
herc ====>
command line, the program terminates with a segmentation fault:A correct quit looks like:
Checking my configuration file, I have determined the following triggers the problem:
In cases where the LCS statement is active, the segmentation fault happens. If it is commented out then Hercules stops properly.
Doing a CCW trace on devices F00 and F01 shows the following: