SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

'make check' tests are failing when Regina REXX is used. #133

Closed mhoes closed 6 years ago

mhoes commented 6 years ago

(Fish note: this issue is essentially a duplicate of issue #136: "make check fails on Test "3211 printer"")


The 'sske' test in the 'make check' test suite fails for me on Linux (verified on latest git):

$ tests/runtest - sske
Files: tests/sske.tst
Variable $ptrsize            set to "8"
Variable $platform           set to "Linux"
Variable $can_s370_mode      set to "1"
Variable $can_esa390_mode    set to "1"
Variable $can_zarch_mode     set to "1"
Variable $max_cpu_engines    set to "64"
Variable $libraries          set to "Shared"
Variable $threading_model    set to "POSIX"
Variable $locking_model      set to "Error"
Variable $shared_devices     set to "1"
Variable $HDL                set to "1"
Variable $externalgui        set to "1"
Variable $IPV6               set to "1"
Variable $HTTP               set to "1"
Variable $sqrtl              set to "1"
Variable $SIGABEND           set to "1"
Variable $CCKD_BZIP2         set to "1"
Variable $HET_BZIP2          set to "1"
Variable $ZLIB               set to "1"
Variable $regex              set to "1"
Variable $rexx_supported     set to "1"
Variable $HAO                set to "1"
Variable $NLS                set to "0"
Variable $cmpxchg1           set to "1"
Variable $cmpxchg4           set to "1"
Variable $cmpxchg8           set to "1"
Variable $hatomics           set to "C11"
Test "sske#1":  2 OK compares.  All pass.
>>>>> line   180: Gpr 2 compare mismatch.
                  Want: 2004
                  Got:
>>>>> line   325: Gpr 2 compare mismatch.
                  Want: 0000000000100004
                  Got:
>>>>> line   423: Gpr 2 compare mismatch.
                  Want: 0000000000100004
                  Got:
>>>>> line   537: Gpr 2 compare mismatch.
                  Want: FFFFFFFF00100004
                  Got:
>>>>> line   651: Gpr 2 compare mismatch.
                  Want: FFFFFFFF00100004
                  Got:
Test "sske#2":  48 OK compares.  5 failures.
Did 2 tests.  1 failed; 1 OK.

So either:

  1. Something has spectacularly exploded on my Fedora 28 and Arch Linux installations.
  2. There is a potential issue with the code, and the test is correct.
  3. There is a potential issue with test itself, but the code is correct.

The 'configure' line used is:

./configure --enable-getoptwrapper --enable-debug --enable-ipv6 --enable-cckd-bzip2 --enable-het-bzip2 --enable-object-rexx --enable-regina-rexx --enable-interlocked-access-facility-2=yes

I have attached a (gzipped) allTests.out, in case that helps.

  Hrm. Now that I actually took the time to read the actual output of the test (this may be a red herring), this line doesn't look good, at least:

Variable $locking_model set to "Error"

Fish-Git commented 6 years ago

Hrm. Now that I actually took the time to read the actual output of the test (this may be a red herring), this line doesn't look good, at least:

Variable $locking_model set to "Error"

Sorry to have to disappoint you, but your initial thought was the correct one: it is indeed a red herring:)

The "Error" $locking_model variable value you see is simply reporting that the mutex locking model being used by Hercules is simply the "PTHREAD_MUTEX_ERRORCHECK" type (as opposed to one of the other types (normal, recursive, etc)).

I have downloaded your attached "allTests.out" file and will look at it as soon as I get a chance. (I'm kind of busy working on several other things at the same time right now.)

One thing I noticed that is rather weird are the reported "Got:" values that immediately follow each "Want:" line: they're all blank! (empty!)   (wtf?!)

That's not right! Something very weird is obviously going on with your systems! The sske test works flawlessly for me on both Windows and CentOS 6.10.

p.s. When you report problems like this it's important that the runtest log correspond to the allTests.out file. It looks like the runtest log you reported above is the output of a manual tests/runtest - sske command, whereas the attached "allTests.out" file appears to be for a full run. Look at the line numbers your runtest log it is reporting:

>>>>> line   180: Gpr 2 compare mismatch.
>>>>> line   325: Gpr 2 compare mismatch.
>>>>> line   423: Gpr 2 compare mismatch.
>>>>> line   537: Gpr 2 compare mismatch.
>>>>> line   651: Gpr 2 compare mismatch.
Did 2 tests.  1 failed; 1 OK.

It's reporting the failure occurred on lines 180, 325, 423, 537 and 651 of the output file, but yet in the "allTests.out" file you attached, Test "sske#2" doesn't even start until line 124470! So how can the mismatch be occurring on line 180, 325, ... etc? Those lines are all well before when the test actually started!

It's almost impossible to see what's going on or determine where things possibly went wrong when the two files do not correspond to one another. In the future, please make sure the two files you attach are for the same run. Thanks!

After having had a chance to take a quick peek at your "allTests.out" file, it looks like something may be wrong with your Rexx installation. Take a look at lines 46-57:

hRexxapi.c(1272)  HHC17531W REXX(OORexx) dlopen 'librexx.so' failed: /lib64/librexx.so: undefined symbol: RexxQueryQueue
hRexxapi.c(1272)  HHC17531W REXX(Regina) dlopen 'libregina.so' failed: libregina.so: cannot open shared object file: No such file or directory
hRexx.c(495)      HHC17511E REXX() Could not enable either Rexx package
hRexx.c(246)      HHC17511E REXX() Could not enable default Rexx package
hRexx.c(1257)     HHC17500I REXX() Mode            : Command
hRexx.c(1261)     HHC17500I REXX() MsgLevel        : Off
hRexx.c(1265)     HHC17500I REXX() MsgPrefix       : Off
hRexx.c(1269)     HHC17500I REXX() ErrPrefix       : Off
hRexx.c(1273)     HHC17500I REXX() Resolver        : On
hRexx.c(1277)     HHC17500I REXX() SysPath    (10) : On
hRexx.c(1281)     HHC17500I REXX() RexxPath   ( 0) :
hRexx.c(1288)     HHC17500I REXX() Extensions ( 8) : .REXX:.rexx:.REX:.rex:.CMD:.cmd:.RX:.rx

Since runtest requires a working Rexx installation, that might explain the unusual test results you're getting (e.g. the blank "Got:" log lines I mentioned). Please double check to make sure Rexx is installed and working properly on your system.

It would also help to know what version of Rexx you are using (althought that information should be reported by Hercules upon startup; see the lines I pasted just above). Please show the output of a rexx -v command. Mine reports:

C:\Users\Fish>rexx -v
Open Object Rexx Version 4.2.0
Build date: Feb 22 2014
Addressing Mode: 64

What does yours report?

And finally, it looks like, according to Hercules (refer to your "allTests.out" file again) the version of gcc you're using is version 8.1.1:

version.c(876)    HHC01417I Built with: GCC 8.1.1 20180712 (Red Hat 8.1.1-5)
                                                  ^^^^^^^^

Notice the date: 2018-07-12. That's fairly new. I suspect there may be serious compiler bugs in that version (as well as in the 7.x series as well (*)).

My own compiler versions as reported by my CentOS 6.10 system (which as I said works perfectly fine) are:

[fish@centos-64 ~]$ clang --version

    clang version 3.4.2 (tags/RELEASE_34/dot2-final)
    Target: x86_64-redhat-linux-gnu
    Thread model: posix

[fish@centos-64 ~]$ gcc --version

    gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
    Copyright (C) 2010 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So I'm beginning to strongly suspect your problem is either a compiler bug and/or a bad install (or a bad version!) of Rexx.

Until then (i.e. until we can determine positively that it is indeed a Hercules bug), I am going to close this GitHub Issue as being "Invalid" (i.e. not a Hercules bug). Once you have more (much stronger) evidence that the problem is indeed a Hercules problem, feel free to re-open this issue.

Thanks.


(*) I say that because of not only https://gcc.gnu.org/gcc-8/ ("This release is a bug-fix release, containing fixes for regressions in GCC 8.1 relative to previous releases of GCC"), but also because of a problem that someone else recently reported (#127 "Undefined symbol "nix_set_thread_name" upon starting hercules on Ubuntu 18.04") that ended up being a compiler bug too. (gcc seems to have been having a lot of trouble lately.)

Fish-Git commented 6 years ago

Closing due to suspected compiler bug and/or user error (Rexx improperly installed). Feel free to re-open once more convincing evidence of an actual Hercules bug is supplied.

mhoes commented 6 years ago

First of all, thank you for looking into this. It does indeed appear to be the case that my system has spectacularly exploded. ;)

I get that rexx needs to function in order for the tests to complete successfully. However, I assUme that at least some more (or perhaps even all) tests require rexx, which makes me wonder why the only test that fails for me is 'sske' (which is why I singled it out in the description), but all the other ones succeed (which is why I included the output of all tests in 'make check' with allTests.out). Sorry if that caused any confusion.

I think you are on the right track with rexx though, so I investigated.

First, a little more information on what my setup looks like: I have both Regina REXX and Open Object REXX installed at the same time, and add support for them in my Hyperion build by specifying both '--enable-object-rexx' and '--enable-regina-rexx' at configure time (which is part of why I included my 'configure' line in the description). Open Object lives in '/usr/bin/rexx', and Regina in '/usr/local/bin/rexx'.

$ /usr/bin/rexx -v
Open Object Rexx Version 4.2.0
Build date: Oct 10 2016
Addressing Mode: 64

$ /usr/local/bin/rexx -v
/usr/local/bin/rexx: REXX-Regina_3.9.1 5.00 5 Apr 2015 (64 bit)

I solved the error of not finding 'libregina.so' by adding '/usr/local/lib' and '/usr/local/lib64' (where the library lives) to /etc/ld.so.conf.d/, which wasn't there by default. I seem to vaguely remember adding that in the past to an older Fedora installation, I guess I forgot to do that again on my latest Fedora and Arch installations. Once done, that error in the log disappears. But the 'sske' test still fails, though.

But I have no idea why the RexxQueryQueue symbol cannot be found in /lib64/librexx.so.

'ldconfig' reports it as being found/used :

$ ldconfig -p | grep librexx.so
        librexx.so.4 (libc6,x86-64) => /lib64/librexx.so.4
        librexx.so (libc6,x86-64) => /lib64/librexx.so

And it looks like 'nm' shows the symbol is there ?

$ nm -D /lib64/librexx.so | grep RexxQueryQueue
                 U RexxQueryQueue

I haven't got a clue what is going on here. For what it's worth, a very simple 'hello world' example (SAY HELLO) works for both Regina and Open Object REXX.

Final comment: Although it could perfectly well turn out to be a compiler bug, I wouldn't base that assumption on the version of the compiler being used. GCC 4.4.7 is ancient, and it wouldn't surprise me at all if there were more bugs in that unmaintained version than in the latest stable.

Oh, well.

mhoes commented 6 years ago

Minor update: You probably noticed this already, but it's not just one test that fails, there are many. It's just that the output scrolls off the screen so fast - and I never scrolled back to take a look - that it appeared to me that it was just one test. Sorry for that.

Also, after fixing the 'cannot find libregina.so' error, it appears from taking a very cursory glance at 'allTests.out', that the test suite does find and enable Regina Rexx, even though tests still fail:

$ grep -i -E "rexx|regina" allTests.out
version.c(876)    HHC01417I With    Object REXX support
version.c(876)    HHC01417I With    Regina REXX support
hRexxapi.c(1272)  HHC17531W REXX(OORexx) dlopen 'librexx.so' failed: /lib64/librexx.so: undefined symbol: RexxQueryQueue
hRexx.c(162)      HHC17528I REXX(Regina) VERSION: REXX-Regina_3.9.1(MT) 5.00 5 Apr 2015
hRexx.c(163)      HHC17529I REXX(Regina) SOURCE:  UNIX
hRexx.c(490)      HHC17525I REXX(Regina) Rexx has been started/enabled
hRexx.c(1257)     HHC17500I REXX(Regina) Mode            : Command
hRexx.c(1261)     HHC17500I REXX(Regina) MsgLevel        : Off
hRexx.c(1265)     HHC17500I REXX(Regina) MsgPrefix       : Off
hRexx.c(1269)     HHC17500I REXX(Regina) ErrPrefix       : Off
hRexx.c(1273)     HHC17500I REXX(Regina) Resolver        : On
hRexx.c(1277)     HHC17500I REXX(Regina) SysPath    (10) : On
hRexx.c(1281)     HHC17500I REXX(Regina) RexxPath   ( 0) :
hRexx.c(1288)     HHC17500I REXX(Regina) Extensions ( 8) : .REXX:.rexx:.REX:.rex:.CMD:.cmd:.RX:.rx
cmdtab.c(670)     HHC01603I *If \$rexx_supported
cmdtab.c(670)     HHC01603I *Message REASON:   No Hercules Rexx support.
cmdtab.c(670)     HHC01603I *If $rexx_VERSION = ''
cmdtab.c(670)     HHC01603I *Message REASON:   Rexx is not installed.
cmdtab.c(670)     HHC01603I diag8cmd  enable noecho     # need diag8 to exec rexx script
cmdtab.c(670)     HHC01603I shcmdopt  enable diag8      # rexx script needs shell access
cmdtab.c(670)     HHC01603I rexx mode subroutine msglevel off msgprefix off errprefix off resolver on syspath on extensions .rexx start auto
hRexx.c(849)      HHC17522E REXX(Regina) Rexx already started/enabled
Error 93 running "/home/maarten/src/sdl-hercules-390-topdir/in-tree-build/hyperion/tests/3211.rexx", line 17: [Incorrect call to routine]
hRexxapi.c(923)   HHC17502E REXX(Regina) ExecCmd ReginaRexxStart RC(-93)
HHC01603I *Testcase invpsw processed 16 Jan 2016 12:11:11 by bldhtc.rexx
HHC01603I *Testcase leapfrog processed 16 Jan 2016 12:11:11 by bldhtc.rexx
HHC01603I *Testcase logicImmediate processed 12 Nov 2015 12:36:58 by bldhtc.rexx

allTests.out.gz

mhoes commented 6 years ago

Hrm. Just for fun and giggles, I just tried the 'make check' test suite for the 'other' Hyperion (https://github.com/hercules-390/hyperion), and that run completes without any failures for me on the same Linux installs as I am having the issue with described in this report for SDL-Hyperion. As far as I know, that version also uses rexx for the tests, which makes me assUme that something is going wrong with the interaction between SDL-Hyperion's 'make check' and rexx on my system, and not per definition that there is something wrong with my setup.

Based on those results, I would re-open this issue if I could; but it looks like I do not have the privileges or authority to do so,

Fish-Git commented 6 years ago

And it looks like 'nm' shows the symbol is there ?

$ nm -D /lib64/librexx.so | grep RexxQueryQueue U RexxQueryQueue

No, it shows the symbol is not there. The 'U' means "undefined".

Haveing two Rexxs installed very much complicates things. Until we can figure out what is going on, I would suggest sticking with using just one or the other. I personally prefer OORexx over Regina, but that's a personal preference. The point is, uninstall one of them (or uninstall both of them and then re-install just one of them) so that you only have one anbd only one rexx installed.

Once we get things working, then we (or rather, YOU!) can try installing the other one too of you want, and see what happens.

Fish-Git commented 6 years ago

And it looks like 'nm' shows the symbol is there ?

$ nm -D /lib64/librexx.so | grep RexxQueryQueue U RexxQueryQueue

Additional info: according to Hercules code, Regina's rexx code is contained in shared library libregina.so, whereas Open Object Rexx's rexx code is contained in shared libraries librexx.so as well as librexxapi.so.

You might want to try you nm -D ... command(s) again.

Fish-Git commented 6 years ago

You probably noticed this already, ...

No, I hadn't noticed, because I discarded your previous allTests.out since I had no corresponding runtest log to go along with it. (that shows which tests failed and where)

Fish-Git commented 6 years ago

Also, after fixing the 'cannot find libregina.so' error, it appears from taking a very cursory glance at 'allTests.out', that the test suite does find and enable Regina Rexx, even though tests still fail:

Examining an 'allTests.out' file without a corresponding runtest log to go along with it is a waste of time. Due to the way Hercules runtest testing suite is designed (and this is true with the other Hyperion as well), the 'allTests.out' file might show lots of "failures" for tests that are logically being skipped (due to "*If" statements in the tst script).

I noticed you attached a new "allTests.out.gz" file, but I am not going to waste my time with it since it does not also contain the runtest log that goes along with it. I told you about this before.

Fish-Git commented 6 years ago

Hrm. Just for fun and giggles, I just tried the 'make check' test suite for the 'other' Hyperion (https://github.com/hercules-390/hyperion), and that run completes without any failures for me on the same Linux installs as I am having the issue with described in this report for SDL-Hyperion.

"make check"?   (Urk)

Believe it or not I completely missed that!  (me: embarassed)

Yes, yes, I know you mentioned it right at the start of your problem report, but my mind didn't let on to the fact that the problem might be in how "make check" is being done by SDL-Hercules-390's hyperion as opposed to the 'other' hyperion. I myself never bother doing any type of "make check". Instead, I always run my tests manually.

Let me take a look at how we're doing "make check". That's where the problem obviously is. Maybe the current directory isn't being set correctly or something else equally silly.

But in the mean time, does running all the tests manually (simple runtest invokation) work for you? I'm hoping it should and the problem is just in how we're doing 'make check'. Give it a try and let me know. Thanks! In the mean time I'll look at how we're doing make check.

Fish-Git commented 6 years ago

In the mean time I'll look at how we're doing make check.

My Makefile.am's "check:" is missing the special check for APPLE which the 'other' hyperion is doing, but otherwise has the exact same $(top_srcdir)/tests/runtest $(top_srcdir)/tests makefile statement to run all tests for "make check".

And for what it's worth, I just tried it on my CentOS 6.10 system just now and all 313 tests ran clean.

So I'm at a complete loss as to why it's not working for you. (Unless, as I said earlier, it has something to do with having multiple different rexxes installed?)

I might need to pull in someone else to work on this issue since I'm not a Linux person and problems like this one tend to stump me. :(

Fish-Git commented 6 years ago

It's just that the output scrolls off the screen so fast - and I never scrolled back to take a look - that it appeared to me that it was just one test. Sorry for that.

Word of advice: whenever you enter a command like make for example, you should always redirect the output to a log file so that you have a record of the output for that command. Then you don't have to worry about the output "scrolling off the screen so fast" that you miss something:

   make  >  make.log   2>&1

Afterwards you can then simply edit your make.log file to see everything that happened, and if anything goes wrong, you have a file that you can attach to a GitHub Issue.

As for myself, I use a simple bash script that also greps the resulting log file for errors or warnings too, which I then redirect to a completely separate "make-errors.log" file, which is the file I always look at to see if anything actually went wrong (and then if anything does go wrong, I can then examine the original "make.log" file to see all the nasty details).

Just trying to help.

mhoes commented 6 years ago

No, it shows the symbol is not there. The 'U' means "undefined".

Got it. Live and learn. Still, I dont see the point in explicitly mentioning all the things that are NOT in there, but that's another issue. Guess I have some googling to do now. ;)

Haveing two Rexxs installed very much complicates things.

If this really is the case, you might want to consider modifying the build system, so that one can no longer specify compile time support for both at the same time, and make it an 'either/or' setting.

Additional info: according to Hercules code, Regina's rexx code is contained in shared library libregina.so, whereas Open Object Rexx's rexx code is contained in shared libraries librexx.so as well as librexxapi.so.

The reason I went looking for RexxQueryQueue in librexx.so, is because allTests.out explicitly stated it failed to find it there :

dlopen 'librexx.so' failed: /lib64/librexx.so: undefined symbol: RexxQueryQueue

Examining an 'allTests.out' file without a corresponding runtest log to go along with it is a waste of time.

So, I guess you would want the full output of 'make check' on stdout/stderr as well ? Sorry, but I cannot find a 'runtest.log', I honestly do not see which other (log-)file you would want.

"make check"? (Urk)

The problem is not in the makefiles. Even if I specify './tests/runtest ./tests' or 'tests/runtest - sske' the issue still occurs.

make > make.log 2>&1

Yes, I know. I used to be a professional Unix/Linux sysadmin for years, I know how redirection in the shell works. ;)

The reason I generally don't bother to do so when running 'make' (I guess it's 'muscle memory' by now), is because generally speaking (but not always, as with 'make check' for example) when you run make you are compiling code, and in my personal experience most developers (not specifically talking about you here) don't really care about compiler WARNINGS. They do care about compiler ERRORS, but since when this occurs the build stops/breaks/aborts (unless you did 'make -k') and the last lines you see on the screen is the actual error, then why bother with redirection ? Oh, well.

The point is, uninstall one of them (or uninstall both of them and then re-install just one of them) so that you only have one anbd only one rexx installed.

Ok, so I uninstalled both. All tests complete without errors with only OORexx installed. The problem only occurs with Regina, not with OORexx. Even when I only have Regina installed (and not both), the problem still occurs.

mhoes commented 6 years ago

For additional bonus points, when running 'make check' - with the same Regina rexx which causes the failure to occur with SDL-Hyperion - on the other Hyperion instead, it completes without errors.

So, (again, for fun and giggles) I did a little experimentation (which may not actually mean anything, I am certainly no expert when it comes to rexx), which led me to try out the 'tests/redtest.rexx' which comes with the other Hyperion, on SDL-Hyperion instead. And at least for the 'sske' test, SDL-Hyperion now completes the test successfully when using the other Hyperion's redtest.rexx.

Same thing for the following tests: 'cipher, digest, ilc, logicimm, mhi, privop, semipriv, sske, stfl': When I run these tests with 'tests/runtest - testname', they all fail when using the SDL-Hyperion's redtest.rexx, and they all succeed when using the other Hyperion's redtest.rexx. Of course, when I do a full 'make check' with the other redtest.rexx I get all kinds of other (expected) failures, which seem to have to do with variables not being declared or functions not existing in the other redtest.rexx version.

Again, no idea if this actually means something, but if it does then this might imply that whatever it is that you are doing differently in your redtest.rexx version does not go down well with Regina rexx. I did do a diff between the two versions, but since there are so many differences and I really have no idea what to look for anyway that did not lead me to further insights.

Fish-Git commented 6 years ago

make > make.log 2>&1

Yes, I know. I used to be a professional Unix/Linux sysadmin for years, I know how redirection in the shell works. ;)

Sorry! It's sometimes hard to tell.

For your own information, I'm a Windows person, not a Unix/Linux person. Classify me as a Unix/Linux newbie or worse. I know only enough about Unix/Linux to be extremely dangerous. ;-)

I do a lot of Googling and then try to do the best I can.

I have CentOS 6.10 setup in a VMware virtual machine that I've had for many years now that I use for Hercules build testing purposes (started out at 6.4), and I'm now trying to setup another Mac OSX 10.13 High Sierra virtual machine too for the same reason (which is going extremely slow), so I'm trying.

Since I'm now aware of your background, if I say anything stupid, be sure to let me know! You have more to teach me than vice versa! :)

mhoes commented 6 years ago

So, I did a new CentOS 6.10 installation, added the 'ghettoforge' repository to it, and installed Regina rexx from it. Compiled SDL-Hyperion, ran 'make check', and guess what ? The same tests fail as on my Fedora 28 installtion: cipher, digest, ilc, logicimm, mhi, privop, semipriv, sske, stfl. I could not verify if things do work as expected on this CentOS install with OORexx, as I couldn't find a pre-build rpm during some quick google searches, but perhaps you can point me to one ?

Fish-Git commented 6 years ago

Ok, so I uninstalled both. All tests complete without errors with only OORexx installed. The problem only occurs with Regina, not with OORexx. Even when I only have Regina installed (and not both), the problem still occurs.

Thank you! That is good information to know. I must be doing something in redtest.rexx somewhere that is non-portable. I'll look into it as soon as I can.

Thanks again.

mhoes commented 6 years ago

Sorry! It's sometimes hard to tell.

No Worries.

For your own information, I'm a Windows person, not a Unix/Linux person.

Yes, I know. I seem to recall an email exchange ages ago, where I explained something about Linux (was it how to run lcov ? Can't recall), and you may have been the person that showed me how to compile Hercules on Windows XP ? (even though I have blessfully forgotten this information since.)

I do a lot of Googling and then try to do the best I can.

Same here. In fact, when I just setup my Centos 6.10 install, only to find out that ipv6 did get configured but not ipv4 (WTF? Really ?) I also had to google to figure out quickly how to fix that.

I have CentOS 6.10 setup in a VMware virtual machine that I've had for many years now

You may want to consider adding a VM with a more recent Linux version (kernel/libc/gcc/etc).

that I use for Hercules build testing purposes

Sounds like good developing practice.

Since I'm now aware of your background, if I say anything stupid, be sure to let me know!

Well... Alright, I'll try to put this as polite as possible: You do seem to be 'jumping to conclusions' just a little, with the immediate assumption that there must be something wrong with my system (messed up rexx, compiler bug) based on the fact that it was working for you on another Linux distro.

You have more to teach me than vice versa! :)

I doubt that. For example, I have exactly zero knowledge about how to write code, and the last time I seriously looked at Windows (besides 'just using it' on the desktop) was way back in 2000 or so with Windows NT4, which is ancient history in ICT-years.

Fish-Git commented 6 years ago

You do seem to be 'jumping to conclusions' just a little, with the immediate assumption that there must be something wrong with my system (messed up rexx, compiler bug) based on the fact that it was working for you on another Linux distro.

Of course! Hercules is perfect dontyouknow? Absolutely no bugs in Hercules code. Therefore the problem must be with your system or something you're doing wrong. Hercules can never be at fault! ;-)

(and if you believe that one I have a friend with a bridge he's willing to sell to you for a good price.)

Fish-Git commented 6 years ago

Examining an 'allTests.out' file without a corresponding runtest log to go along with it is a waste of time.

So, I guess you would want the full output of 'make check' on stdout/stderr as well?

Yes! It is make check that is invoking Hercules's runtest script which is where the problem is (or rather, in the redtest.rexx script that the runtest script invokes), so it is that output I need to see. Redirect the output of make check to a log file and send me that along with the corresponding allTests.out file that gets created as well.

Sorry, but I cannot find a 'runtest.log', I honestly do not see which other (log-)file you would want.

Do this: manually run runtest for yourself (don't specify any specific test; let it default to running all tests) and redirect the output to a log file. Then send me that log file along with everything in your current directory (which is where the allTests.out file and other runtest work files should be). Does that make sense?

In the mean time I'm going to try installing Regina (which I haven't done in a looooong time) on my CentOS test system (after unintalling OORexx) so I can maybe see for myself what the heck is gong on.

Fish-Git commented 6 years ago

... imply that whatever it is that you are doing differently in your redtest.rexx version does not go down well with Regina rexx.

Which is what is bothering me! I mean, I thought Rexx was supposed to be portable, yes? The same rexx code should behave identically across all rexx implementations, yes? And if it doesn't, then it means there's a bug in that implementation, yes? (*)


(*) Yes, I know! There I go trying to place the blame elsewhere instead of on Hercules. Refer to two comments above for why I so often do this. >;-)

mhoes commented 6 years ago

Redirect the output of make check to a log file and send me that along with the corresponding allTests.out file that gets created as well.

Alright, so I guess a simple './tests/runtest > runtest.out.txt 2>&1' would be sufficient.

Then send me that log file along with everything in your current directory

Well, since I do an intree build, and I run runtest in the top-level source code directory (which includes sources, object files, binaries, etc.) I highly doubt you would want that ;). However, the files that seem to get created are: "./allTests.testin ./3211.txt ./allTests.out", so I guess you mean those, and nothing else.

Well here is a gzipped tar file containing those, including the output of runtest of course.

runtest.logs.tar.gz

mhoes commented 6 years ago

I mean, I thought Rexx was supposed to be portable, yes? The same rexx code should behave identically across all rexx implementations, yes? And if it doesn't, then it means there's a bug in that implementation, yes?

Yes, agreed. However, I do seem to vaguely recall that one of the implementations had 'extensions' to the language only implemented in that version. Also, even when there is a formal specification, implementations can and do still differ from time to time, for example because the specification is ambiguous, or left something out unintentionally, or even leaves certain things open to the implementation on purpose.

Having said that, I still think it is the most likely there is a bug in Regina. Which leaves you with these options: work around the bug, drop support for Regina, or open a bug report for Regina and if/when it gets fixed then make that version the minimal required one.

mhoes commented 6 years ago

In the mean time I'm going to try installing Regina (which I haven't done in a looooong time) on my CentOS test system (after unintalling OORexx) so I can maybe see for myself what the heck is gong on.

Well you could compile from sources of course, but here is what I did :

Add another repository on CentOS called 'ghettoforge' (yes, it'a an outdated repo that's not updated anymore so 'everyone' recommends against using it) and then install from that repo :

wget http://mirror.ghettoforge.org/distributions/gf/el/6/gf/x86_64/gf-release-6-10.gf.el6.noarch.rpm
sudo rpm -Uvh ./gf-release-6-10.gf.el6.noarch.rpm
sudo yum --enablerepo=gf install Regina-REXX Regina-REXX-devel

--- EDIT ---

Oh! Important bit: the 'x86_64' part in the url that you feed the 'wget' command, points to the architecture. This one is for 64-bit Intel/AMD. You might need the 32-bit version instead, depending on your hardware and what you installed in that CentOS VM. In which case you need to replace that part with 'i386'.

Fish-Git commented 6 years ago

Alright, so I guess a simple './tests/runtest > runtest.out.txt 2>&1' would be sufficient.

Well here is a gzipped tar file containing those, including the output of runtest of course.

runtest.logs.tar.gz

Perfect! Thank you!

Fish-Git commented 6 years ago

FYI: After removing oorexx and installing regina rexx on my centos system, I am now seeing exactly the same thing that you are (i.e. the exact same tests are failing in the exact same way): except for the first 3211 test failure (which might be a clue; I'm not sure yet), ALL of the failing tests are failing on register compares!

The last change I made to redtest.rexx was to try and "genericize"the register value extraction logic due to the way register values are displayed by Hercules depending on whether more than one CPU is defined or not and whether the register being displayed is a general purpose register or a control register:

Revision: 74d8d366e98efe63995da87d1cd314f67c21af4d
Author: Fish (David B. Trout) <fish@infidels.org>
Date: 5/7/2018 7:05:42 AM
Message:
Fix redtest.rexx register parsing bug when numcpu > 1
----
Modified: tests/redtest.rexx

So there's something I'm doing in that function that Regina doesn't like. Either the parse statement or the interpret statement or the way it's being called...

I'm on it!

mhoes commented 6 years ago

except for the first 3211 test failure (which might be a clue; I'm not sure yet),

Does this line in 'allTests.out' help with that one ?

Error 93 running "/home/maarten/src/sdl-hercules-390-topdir/in-tree-build/hyperion/tests/3211.rexx", line 17: [Incorrect call to routine]

mhoes commented 6 years ago

The last change I made to redtest.rexx was So there's something I'm doing in that function that Regina doesn't like

I honestly do not know what exact change/commit caused this behavior. I think I have seen this failure happening some time before I reported it (can't recall how long ago, or if this really is the case [sorry, 'memory' is failing me these days]). I just only did not report it untill now. Does not by definition mean that the 'last' commit you made caused this issue. Not saying it didnt either, just saying you might not want to overly focus on 'the last change I made'.

mhoes commented 6 years ago

(and if you believe that one I have a friend with a bridge he's willing to sell to you for a good price.)

Owwhh, your'e willing to sell me a friend under a bridge for 'a good price' ? PM me for details ;)

--- EDIT ---

Lost In Translation

Fish-Git commented 6 years ago

PROBLEM FOUND AND FIXED by commit 46b278269351813a5f49a7d3e03ea63fc413a4b9.

Closing issue! @mhoes? Please pull latest get, rebuild and retry. It should work now.

If for some reason it doesn't, please re-open this issue.

Thanks for all your patience!

mhoes commented 6 years ago

All pass now, except for '3211'. But I don't think it is related, so I may need to open up a new issue for that ?

Anyways, for that test with Regina I get an error, but by reading the diagnostic error message I honestly cannot tell if it is 'expected' behavior or not :

>>>>> line   326: Received unexpected wait state:  000A0000 000100D8
>>>>> line   374: Storage compare mismatch.
                  Want: R:00001000 00000000 00000000 00000000 00000000  "Return Code flags"
                  Got:  R:00001000 000000F3 F4F5F6F7 F8F90000 00000000

                  If the above is an "unexpected wait state" of 000100D8
                  it means that the DIAG8 instruction completed with a non-
                  zero condition code.  The likely reason for this is the
                  results buffer wasn't large enough because the Hercules
                  command the test issued resulted in an unexpected error
                  message (which couldn't fit into DIAG8's response buffer).

                  If any of the below test completion flags are non-zero
                  it means that particular test has failed.  For example,
                  if the completion flags are 000000F3 F4F5F6F7 F8F90000...
                  it means that tests 3 thru 9 have failed (F3 = '3' etc).

Test "3211 printer":  0 OK compares.  2 failures.

And with OORexx, I get this error :

SKIPPING: Testcase 3211 printer
REASON:   Rexx is not installed.

Which obviously isn't true. Somehow it seems like the variable 'rexx_VERSION' does not get set with OORexx, which you explicitly test for in tests/3211.tst (but nowhere else). If I remove that check from 3211.tst, I get the same error with OORexx as with Regina.

Here are the logs from both the Regina and OORexx runs.

runtest.regina.logs.tar.gz runtest.oorexx.logs.tar.gz

Fish-Git commented 6 years ago

All pass now, except for '3211'. But I don't think it is related, so I may need to open up a new issue for that?

I don't see the need. We might as well continue working on the problem in this issue since it was this issue that originally detected it and the problem is still not fixed.

[...]

And with OORexx, I get this error :

SKIPPING: Testcase 3211 printer REASON: Rexx is not installed.

Which obviously isn't true. Somehow it seems like the variable 'rexx_VERSION' does not get set with OORexx, which you explicitly test for in tests/3211.tst (but nowhere else). If I remove that check from 3211.tst, I get the same error with OORexx as with Regina.

(Dang it! Now how and the heck did I miss that?!)

(sigh) Re-opening due to above error...

mhoes commented 6 years ago

(Dang it! Now how and the heck did I miss that?!)

Perhaps because "the output was scrolling off the screen so fast" that you missed it ? ;)

To be fair though, the last line in the output does say 'Did 313 tests. All OK.', even though one was skipped (which I guess, does not count as 'failure' according to the current logic).

Fish-Git commented 6 years ago

And with OORexx, I get this error :

SKIPPING: Testcase 3211 printer REASON: Rexx is not installed.

Which obviously isn't true.

Actually, from Hercules's point of view, it absolutely is true.

While you might believe it to be true since you did afterall install it, and you built Hercules with "--enable-object-rexx" support (which we can see in allTests.out lines 29-30):

version.c(876)    HHC01417I With    Object REXX support
version.c(876)    HHC01417I Without Regina REXX support

but... it is unfortunately still not installed properly. Notice lines 45-55:

hRexxapi.c(1272)  HHC17531W REXX(OORexx) dlopen 'librexx.so' failed: /lib64/librexx.so: undefined symbol: RexxQueryQueue
hRexx.c(495)      HHC17511E REXX() Could not enable either Rexx package
hRexx.c(246)      HHC17511E REXX() Could not enable default Rexx package
hRexx.c(1257)     HHC17500I REXX() Mode            : Command
hRexx.c(1261)     HHC17500I REXX() MsgLevel        : Off
hRexx.c(1265)     HHC17500I REXX() MsgPrefix       : Off
hRexx.c(1269)     HHC17500I REXX() ErrPrefix       : Off
hRexx.c(1273)     HHC17500I REXX() Resolver        : On
hRexx.c(1277)     HHC17500I REXX() SysPath    (10) : On
hRexx.c(1281)     HHC17500I REXX() RexxPath   ( 0) :
hRexx.c(1288)     HHC17500I REXX() Extensions ( 8) : .REXX:.rexx:.REX:.rex:.CMD:.cmd:.RX:.rx

It appears OORexx is still not installed properly on your system. You're still getting the "undefined symbol" error for RexxQueryQueue in /lib64/librexx.so, which is causing OORexx to not get loaded/started, thereby leaving Rexx unavailable for Hercules use.

Somehow it seems like the variable 'rexx_VERSION' does not get set with OORexx, which you explicitly test for in tests/3211.tst (but nowhere else).

The 3211.tst is not testing for OORexx. Rather, it is testing for any Rexx being installed (by testing for a non-empty rexx_VERSION):

    *If $rexx_VERSION = ''
        *Message SKIPPING: Testcase 3211 printer
        *Message REASON:   Rexx is not installed.
    *Else

Now it may be debatable whether the REASON *Message should be "Rexx is not installed" or something else such as maybe "Rexx is not properly installed" (or perhaps "Rexx is not available"), but the point is, from Hercules's point of view, Rexx is indeed not installed!

As I believe I mentioned earlier, Open Object Rexx's code is contained in two shared libraries (as opposed to just one like Regina): "librexx.so" and "librexxapi.so", and it is the second one -- librexxapi.so -- that contains the RexxQueryQueue symbol:

[fish@centos-64 Desktop]$ nm -D /usr/lib64/librexxapi.so | grep "RexxQueryQueue"
0000003d3640db00 T RexxQueryQueue
[fish@centos-64 Desktop]$ 

  I'm wondering if maybe by installing Regina before/after OORexx (or vice-versa) your (library search path?) hasn't somehow gotten messed up such that Hercules (or the system itself?) is unable to find/load the librexxapi.so shared library, thereby leading to the undefined symbol: RexxQueryQueue error we're seeing in Hercules?

In any case, I know you hate it when I say it but I feel compelled to say it anyway, since, from my point of view, it certainly seems true!:

This does not seem to be a problem with Hercules but rather a problem with your system (i.e. at your end of things). Either your PATH isn't right (or your "library path? Remember, I'm not a Linux person!), or something else that's causing Hercules's attempt to load and start OORexx to fail because it can't find the librexxapi.so shared library (which is where the RexxQueryQueue symbol lives that the librexx.so shared library is needing to resolve).

What the problem ultimately is and how to properly fix it I leave up to you.

(And believe it or not, after having just re-opened this issue, I'm now going to re-close it again!)

mhoes commented 6 years ago

Well... I guess this discussion could go on just about forever, but I am not looking forward to that. So, final notes:

The librexxapi.so on my system does contain "RexxQueryQueue" :

$ nm -D /usr/lib64/librexxapi.so | grep "RexxQueryQueue"
0000000000013559 T RexxQueryQueue

And the 'library finding thingy' on Linux does find it :

$ ldconfig -p | grep librexxapi.so
        librexxapi.so.4 (libc6,x86-64) => /lib64/librexxapi.so.4
        librexxapi.so (libc6,x86-64) => /lib64/librexxapi.so

And why do the tests/scripts that actually use OORexx still function correctly ? (apart for the 'version' test). Because 'it isnt installed properly' ? Does not compute.

Now, I really do not know why the Hyperion goes looking for RexxQueryQueue in /lib64/librexx.so when it actually is in /lib64/librexxapi.so, resulting in :

dlopen 'librexx.so' failed: /lib64/librexx.so: undefined symbol: RexxQueryQueue

(Are you properly using/calling 'dlopen()' ? including the right header files everywhere ? #include ? Frak if i know.)

But that should be your particular area of expertise, not mine.

PS: How does the test suite actually go about determining the rexx version installed ? The equivalent of running 'rexx -v' and capturing what gets returned ? Some magic rexx API call ? Just curious.

mhoes commented 6 years ago

So... Final, final words (I sincerely hope):

I just compiled OORexx from source on my cleanly installed CentOS 6.10 system (which, apart from the defaults, my only addition was installing the Regina RPMS's, which I removed before building OORexx from source), rebuild latest git Hyperion, ran 'make check', and guess what ?

I get the same frakkin message :

SKIPPING: Testcase 3211 printer
REASON:   Rexx is not installed.

So, unless the simple act of installing and removing packages on Linux has been broken for everyone for years now, I am inclined to think that something is going wrong with the Hyperion code/tests.

PS: You could have tried to verify the behavior on your CentOS 6.10 install, but, well, no idea why you didn't do that.

mhoes commented 6 years ago

Well, that all sounded a lot harsher than what was intended. Sorry for that, at least. For whatever that is worth.

Fish-Git commented 6 years ago

Final, final words (I sincerely hope)

Don't say that! We can still work together on this problem of yours in this GitHub Issue thread even though the issue is closed. Being in "closed" status shouldn't preclude us from continuing our conversation!

Now, I really do not know why the Hyperion goes looking for RexxQueryQueue in /lib64/librexx.so ...

It doesn't. It simply attempts to load the librexx.so library and the error you see is the result.

(Are you properly using/calling 'dlopen()' ? including the right header files everywhere? #include ? Frak if i know.)

Of course we are!

I just compiled OORexx from source on my cleanly installed CentOS 6.10 system (which, apart from the defaults, my only addition was installing the Regina RPMS's, which I removed before building OORexx from source), rebuild latest git Hyperion, ran 'make check', and guess what ?

I get the same frakkin message :

SKIPPING: Testcase 3211 printer REASON: Rexx is not installed.

Check your allTests.out file: is the same "undefined: RexxQueryQueue" error message occurring? I suspect it probably is, which is why the 3211 test thinks Rexx isn't installed: because it's unavailable to Hercules because Hercules is unable to load and start it (because of the undefined RexxQueryQueue error which I have no idea why is occurring; fwiw it doesn't occur on my own system).

Perhaps building from source is not the proper way to install OORexx? It was so long ago that I installed it on my own CentOS 6.10 system (which was 6.4 at the time I think) that I honestly can't recall how I did it. I think I may have used the 'rpm'(?) method. (I rarely install packages from source unless I absolutely have to.)

PS: You could have tried to verify the behavior on your CentOS 6.10 install, but, well, no idea why you didn't do that.

What?! What makes you think I didn't?

As for myself, I personally prefer OORexx over Regina, so I've had OORexx installed on my CentOS system since, like, forever! And it works just fine! The 3211 test is not skipped and it passes just fine on my CentOS 6.10 system, so why doesn't it on yours?!

Fish-Git commented 6 years ago

FYI: from a test run on my CentOS 6.10 system that I just did just now to confirm things:

[...]
Test "3211 printer":  2 OK compares.  All pass.
[...]
Did 313 tests.  All OK.

So why doesn't it work on your system?

Fish-Git commented 6 years ago

PS: How does the test suite actually go about determining the rexx version installed? The equivalent of running 'rexx -v' and capturing what gets returned? Some magic rexx API call? Just curious.

Nope: None of those. It's done by the redtest.rexx script itself which parses and extracts information directly from the Hercules messages themselves (in this case the HHC17528I, HHC17529I, etc, messages; if you know rexx, refer to the 'hrexx' subroutine in redtest.rexx).

Fish-Git commented 6 years ago

BREAKING NEWS!

I just now tried installing Regina 3.9.1 onto my CentOS 6.10 system (which already has OORexx 4.2.0 installed since, like, forever) and got the following "Conflict" dialog message when I went to install the Regina-REXX-3.9.1-1.x86_64-CentOS-6.6.rpm package:

Test Transaction Errors: file /usr/bin/rexx from install of Regina-REXX-3.9.1-1.x86_64 conflicts with file from package ooRexx-4.2.0-9940.rhel65.x86_64 file /usr/bin/rxqueue from install of Regina-REXX-3.9.1-1.x86_64 conflicts with file from package ooRexx-4.2.0-9940.rhel65.x86_64 file /usr/share/man/man1/rxqueue.1.gz from install of Regina-REXX-3.9.1-1.x86_64 conflicts with file from package ooRexx-4.2.0-9940.rhel65.x86_64

(The pre-requisite Regina-REXX-lib-3.9.1-1.x86_64-CentOS-6.6.rpm installed just fine.)

When I did my Regina Rexx tests, I uninstalled (yum remove ...) my OORexx installation beforehand, so when I did my tests I had only Regina installed. This is the first time I have tried having both installed at the same time (which, if the above error message is any indication, you apparently cannot do!).

Can I safely presume that when you installed one or the other (or both?) from source that you verified there were no errors anywhere? (Hey! Just asking! I know you're an experienced Linux geek but even we geeks sometimes have lapses!)

And for what it's worth, when I installed Regina (by itself, after having uninstalled OORexx beforehand), I had to install all three rpm packages in the proper order: the 'lib' rpm first (because attempting to install the binaries first resulted in a dependency check error that said one of the .so shared libraries wasn't available or something), the regular binaries rpm second, and the 'devel' rpm last.

I still have my VMware snapshot where only Regina is installed. Let me try installing OORexx on top of that to see if any type of similar conflict error occurs. It might be that if you want to have both installed, you need to do so in the right order/sequence? (i.e. Regina first then OORexx, or vice-versa?)

Fish-Git commented 6 years ago

MORE BREAKING NEWS!

I still have my VMware snapshot where only Regina is installed. Let me try installing OORexx on top of that to see if any type of similar conflict error occurs.

Yep! Same thing!

conflict

Dialog text:

Test Transaction Errors:   file /usr/bin/rexx from install of ooRexx-4.2.0-9940.rhel65.x86_64 conflicts with file from package Regina-REXX-3.9.1-1.x86_64
  file /usr/bin/rxqueue from install of ooRexx-4.2.0-9940.rhel65.x86_64 conflicts with file from package Regina-REXX-3.9.1-1.x86_64
  file /usr/share/man/man1/rxqueue.1.gz from install of ooRexx-4.2.0-9940.rhel65.x86_64 conflicts with file from package Regina-REXX-3.9.1-1.x86_64

So apparently -- at least on CentOS 6.10 (6.x?) -- you cannot have both Rexx packages installed at the same time.

Which is more than likely your problem since that appears to be what you are trying to do.

Hercules is designed to support having both installed at the same time (perhaps incorrectly, but that's another issue altogether), but only if you can logically do so (i.e. only if your system and/or the Rexx packages themselves allow it (which they apparently do not)).

Maybe to "switch" between the two Rexx installations you need to change (update) your PATH? I presume the only way you managed to get both of them installed on your system was to install them into completely different places so that there were no conflicts between the two (since you installed from source and not from an rpm).

When you do a which rexx, which one is found? When you do a rexx -v, which one runs?

I'm not a Linux person but it sounds to me like you need to do a bit of special handling in order to switch from one to the other. And you probably need to do that before you try running Hercules with support for that one enabled.

I should probably change configure.ac to disallow enabling both. I don't know why Enrico designed it that way.

Fish-Git commented 6 years ago

@mhoes : Just issuing an FYI "GitHub mention" in case you've already unsubscribed from notifications (perhaps out of disgust/frustration) as a result of my having closed this issue. Some new breaking news has arisen: Refer to my most recent comments: apparently you cannot have both rexxes installed at the same time.

Fish-Git commented 6 years ago

And why do the tests/scripts that actually use OORexx still function correctly ? (apart for the 'version' test). Because 'it isnt installed properly' ? Does not compute.

Sure it does. Except for the 3211 test script, none of the other test scripts need nor use rexx at all. All of the other tests only require/use Hercules, but not rexx. The 3211 test is currently the only test that needs to use rexx.

And the redtest.rexx script which processes Hercules's test output (allTests.out) and reports the success/failure of each test only requires a working rexx, but doesn't otherwise care which rexx is used. The rexx that is used by redtest.rexx can be a completely different rexx than the one Hercules was built with.

Thus, if your OORexx installation is broken (as it appears to be) whereas your Regina rexx installation works just fine, then a Hercules build using --enable-object-rexx will fail (thereby leading to the 3211 test failure, which is the only test that requires/uses rexx) whereas the redtest.rexx script will work just fine, and thus be able to report the success/failure of all of the tests. The redtest.rexx script is probably using Regina in such a situation whereas Hercules was built to use OORexx (which is broken).

Conversely, if Hercules is built using --enable-regina-rexx, then Hercules will work fine (because your Regina installation is working fine) and the 3211 test will pass, and again, the redtest.rexx test reporting script will also work fine (because once again you do have a working rexx installed: Regina!) and be able to report the success/failure of all the tests.

The key takeaway here is the rexx that the redtest.rexx runtest reporting tool uses doesn't need to be the same rexx that Hercules is trying to use. They can be different. (Which seems to be the case in your environment.)

But regardless of which rexx is the working one and which rexx one is the broken one, the rexx that Hercules was told to use (--enable-xxxx-rexx) must be a working rexx, or else the 3211 test will fail (since it requires a working Hercules rexx environment in order to run all of its tests).

Does that "compute" now?

mhoes commented 6 years ago

One: Yes, I fully realize one cannot have two different files with the same name located in the same directory. And that yum/dnf are smart enough to detect such conflicts/errors and prevent you from doing so. Yes, at the beginning I did install Regina from source (but OORexx from the system repo's provided). Yes, I did make sure that was in another location: OORexx lived in "/usr/bin, /usr/lib64, /usr/include" and Regina in "/usr/local/bin, /usr/local/lib64, /usr/local/include". Yes, I did set the PATH before each test so the right version would be found. Yes, ./configure and friends are smart enough to find both in both locations and distinguish between the two, or else the build with support for both would have broken because the right header files to include would not be found or the right libraries to link against would not be found.

Two: Back when you still thought that the initial issue was caused by me having two rexx implementations installed at the same time (instead of it being an incompatibility between Regina and OORexx), I: Removed both, and from that point onward made sure I only had one installed at the same time (and switched to the system repo's versions instead of sources). (At least for Fedora, I could not find a pre-build OORexx for CentOS 6.10 so that's still build from source, but even there I do not have two versions installed at the same time). So from that point forward, each test sequence was: remove the rexx package version I am not testing with at that point (for example, Regina). Install the rexx package version I plan to test with (for example, OORexx). Rebuild Hyperion with support for only that rexx version (OORexx, for example). Run the tests with that rexx version (OORexx, for example). So by only having one of the two installed at the same time by that point, I made sure there could be no conflicts, either at build time or test time. So, when I reported the '3211' error for OORexx, I made sure that only OORexx was installed both at compile time and at runtest execution time. So your scenario of: "redtest.rexx is run by another rexx implementation than the one Hyperion is build with" does not apply, because the two are no longer installed at the same time, and therefore it cannot be 'found'. Which brings me back to my point: How come redtest.rexx is executed properly by OORexx, but Hyperion appears to be having a problem with OORexx ?

No more comments from the peanut gallery, please. ;)

For what it's worth: Although the same 3211 issue occurs with OORexx on CentOS for me, there is no 'dlopen' error message in allTests.out on CentOS. Which seems to contradict with your assessment that the 'dlopen' message on Fedora is the root cause of the issue. --EDIT-- Or, at least, I'm running into two different root causes on Fedora and CentOS which lead to the same result.

mhoes commented 6 years ago

PS: How does the test suite actually go about determining the rexx version installed? The equivalent of running 'rexx -v' and capturing what gets returned? Some magic rexx API call? Just curious.

Nope: None of those. It's done by the redtest.rexx script itself which parses and extracts information directly from the Hercules messages themselves (in this case the HHC17528I, HHC17529I, etc, messages; if you know rexx, refer to the 'hrexx' subroutine in redtest.rexx).

Alright, so how does Hercules determine what the rexx version is, then ? The magic appears to be in 'hRexx.c', but my mad l33t coding skillz fail me. ;)

Fish-Git commented 6 years ago

So from that point forward, each test sequence was: remove the rexx package version I am not testing with at that point (for example, Regina). Install the rexx package version I plan to test with (for example, OORexx). Rebuild Hyperion with support for only that rexx version (OORexx, for example). Run the tests with that rexx version (OORexx, for example). So by only having one of the two installed at the same time by that point, I made sure there could be no conflicts, either at build time or test time. So, when I reported the '3211' error for OORexx, I made sure that only OORexx was installed both at compile time and at runtest execution time. So your scenario of: "redtest.rexx is run by another rexx implementation than the one Hyperion is build with" does not apply, because the two are no longer installed at the same time, and therefore it cannot be 'found'.

Wow. Okay. So only OORexx was installed when you built Hercules with only OORexx specified (--enable-object-rexx=yes), and yet despite that, the 3211 test still fails?! (Because Hercules still gets that "undefined RexxQueryQueue" error?) Do I have that right?

If so, I don't know what to say, @mhoes. I really don't. I'm stumped. It works just fine for me on my own CentOS 6.10 system. I can't explain why it doesn't work on your system. It's got to be something simple that you're overlooking or not doing right that is escaping both of us.

Maybe it's switching from one to the other? Let me try that test.

Since I do my testing using VMware virtual machines I have the luxury of being able to take snapshots of the system while it's in a certains state, and I still have snapshots of my CentOS 6.10 system at its normal "current" state (where only OORexx was installed from the very beginning) as well as with only Regina installed (after having yum removed OORexx), so let me try with each snapshot, uninstalling the currently installed rexx and installing the other rexx (and then rebuilding Hercules appropriately and re-testing of course). Maybe the uninstall of one or the other is leaving something behind?

In the mean time, please try this on your own CentOS 6.10 system (where I believe you stated OORexx still fails for you, yes?): thoroughly (whatever that means to you) remove both rexxs, and then try installing only OORexx from the rpms provided on the OORexx web site:

The package I used was the "ooRexx-4.2.0-1.centos65.x86_64.rpm" package.

And just for the record (for completeness), I also installed Regina from rpms too, from the following web site:

using the following packages in the below order:

  1. Regina-REXX-lib-3.9.1-1.x86_64-CentOS-6.6.rpm
  2. Regina-REXX-3.9.1-1.x86_64-CentOS-6.6.rpm
  3. Regina-REXX-devel-3.9.1-1.x86_64-CentOS-6.6.rpm

(There appears to also be rpms for Fedora too on both of the above mentioned OORexx and Regina web sites too, if that's of any interest to you)

If my own tests succeed, I would then like to try installing each from source like you have been doing. As I said earlier I don't normally do that but at this point I'm willing to try anything! So I'd like you help with that if you're willing to provide it once I reach that point.

Thanks.

(this issue is starting to tick me off! it's so fricking weird!)

Fish-Git commented 6 years ago

Alright, so how does Hercules determine what the rexx version is, then? The magic appears to be in 'hRexx.c', but my mad l33t coding skillz fail me. ;)

Wrong place. :)

The logic is in source file 'hRexxapi.c', function 'GetRexxVersionSource()'. It calls rexx and asks it to execute an "in-storage" script.

The script that it asks to be executed is defined by macro 'VER_SRC_INSTOR_SCRIPT' in header 'hRexx.h':

/* A simple Rexx "in storage" script to retrieve Rexx version/source */
#define VER_SRC_INSTOR_SCRIPT       /* return results as 2 lines */   \
                                    "parse version ver\n"             \
                                    "parse source  src .\n"           \
                                    "return ver || '0a'x || src\n"

the results of which is placed into variables 'PackageVersion' and 'PackageSource' to eventually be written to the Hercule console in messages HHC17528I and HHC17529I.

Fish-Git commented 6 years ago

the results of which is placed into variables 'PackageVersion' and 'PackageSource' to eventually be written to the Hercule console in messages HHC17528I and HHC17529I.

Which the 'redtest.rexx' script then parses and places into variables '$rexx_VERSION' and '$rexx_SOURCE', which can then be used in test script *If statements (like the one the 3211 tst script is doing).

The key takeaway here is, while it would be easy for 'redtest.rexx' to do its own "parse version" and "parse source" (since it is after all a rexx script and can thus do it directly itself! Duh!), it cannot be guaranteed that the version/source value that it retrieved was the same as what Hercules was using. Thus the purpose of parsing the rexx version/source messages that Hercules issues rather than retrieving it itself.

Thus, if anything goes wrong within Hercules in its attempt to load/start rexx, messages HHC17528I and HHC17529I end up being empty and 'redtest.rexx's '$rexx_VERSION' and '$rexx_SOURCE' variables too as a result.

Fish-Git commented 6 years ago

FYI: I have no idea whether this makes any difference or not but for the record, the reported osversion and kernelversion for my CentOS 6.10 system is "CentOS release 6.10 (Final)" and "2.6.32-754.3.5.el6.x86_64".