SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
246 stars 92 forks source link

RDP (xrdp) failure with s390x Ubuntu 18.04.6 #551

Closed Peter-J-Jansen closed 1 year ago

Peter-J-Jansen commented 1 year ago

This issue was encountered with the latest current up-to-date development branch of SDL-hyperion 4.5, hosted on an Intel X86-64 based Ubuntu 18.04 PC.

An ABEND S0C4 Reason 003B was experienced very seldom running z/OS 2.4 but could not be analyzed any further. I presumed, probably incorrectly, that it was related to having the Vector Faciltiy not being disasbled under z/OS, so the problem, which was only superficially investigated, was ignored.

But when running s390x Ubuntu 18.04.6 (i.e. fully updated), the same problem occurred. It is reproducibly caused by attempting an XRDP connection to this Ubuntu. And under s390x Ubuntu, some disgnostic data is automatically delivered :

13:56:28 HHC00814I Processor CP01: SIGP Initial CPU reset                (0B) CP02, PARM 0000000000000000: CC 0
13:56:28 HHC00814I Processor CP01: SIGP Set prefix                       (0D) CP02, PARM 000000007FEE8000: CC 0
13:56:31 HHC00814I Processor CP00: SIGP Set prefix                       (0D) CP02, PARM 0000000000000000: CC 0
13:56:45    332.196837! User process fault: interruption code 003b ilc:2 in libc-2.27.so
13:56:45 3ff80b80000+199000!
13:56:45    332.196986! Failing address: 0000000000000000 TEID: 0000000000000800
13:56:45    332.197043! Fault in primary space mode while using user ASCE.
13:56:45    332.197127! AS:00000001d506c1c7 R3:0000000000000024
13:56:49 HHC00814I Processor CP00: SIGP Initial CPU reset                (0B) CP02, PARM 0000000000000000: CC 0
13:56:49 HHC00814I Processor CP00: SIGP Set prefix                       (0D) CP02, PARM 000000007FEE8000: CC 0
13:56:52 HHC00814I Processor CP00: SIGP Set prefix                       (0D) CP02, PARM 0000000000000000: CC 0
[  332.196837] User process fault: interruption code 003b ilc:2 in libc-2.27.so[3ff80b80000+199000]
[  332.196986] Failing address: 0000000000000000 TEID: 0000000000000800
[  332.197043] Fault in primary space mode while using user ASCE.
[  332.197127] AS:00000001d506c1c7 R3:0000000000000024
[  332.197229] CPU: 0 PID: 2536 Comm: xrdp-sesman Not tainted 4.15.0-206-generic #217-Ubuntu
[  332.197251] Hardware name: HRC 2828 EMULATOR EMULATOR (LPAR)
[  332.197279] User PSW : 0000000077736309 00000000d72854d8
[  332.197331]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
[  332.197380] User GPRS: 0000000000000000 000003ff81076750 0000000000000000 000002aa15e7d320
[  332.197411]            000002aa15e7d6e0 0000000000000001 0000000000000000 000002aa15e7d510
[  332.197442]            000000000000000b 000003ffffb7e168 000002aa15e7d320 000003ffffb7dc40
[  332.197473]            000003ff81026000 000002aa15e77790 000003ffffb7dcf8 000003ffffb7dc40
[  332.197614] User Code: 000003ff80c4481c: b24f0010            ear     %r1,%a0
                          000003ff80c44820: eb110020000d        sllg    %r1,%r1,32
                         #000003ff80c44826: b24f0011            ear     %r1,%a1
                         >000003ff80c4482a: 95002000            cli     0(%r2),0
                          000003ff80c4482e: d207b0b01028        mvc     176(8,%r11),40(%r1)
                          000003ff80c44834: a784004d            brc     8,000003ff80c448ce
                          000003ff80c44838: b3c10083            ldgr    %f8,%r3
                          000003ff80c4483c: b3c100a4            ldgr    %f10,%r4
[  332.198401] Last Breaking-Event-Address:
[  332.198436]  [<000003ff80c44b04>] 0x3ff80c44b04

Program interruption code 3B generated in "dat.c" looks OK to me, but then again I realize that my understanding of the intricate DAT details around Access Register (AR) access are currently no help whatsoever in trying to find the root cause of this problem or fixing it. But please feel free to ask for actions I could undertake to try helping resolving this issue.

Cheers,

Peter

Fish-Git commented 1 year ago

Have you tried the same thing on real iron (e.g. LinuxOne) or on e.g. a z/PDT? Does it only fail on Hercules?

Fish-Git commented 1 year ago

Peter: how do I establish a XRDP connection to Ubuntu? I have s390x Ubuntu 18.04 that runs under Hercules, and I also have a real iron Ubuntu 22.04 system running on my LinuxOne account, so I'd like to try it on both. What do I need to do to establish a XRDP connection? Thanks.

Peter-J-Jansen commented 1 year ago

Thanks for the feedback Fish, I'll need a fresh s390x Ubuntu 18.04 install on zPDT, as my Hercules one is on a CCKD64 compressed DASD.

I'm not sure the installation will work on zPDT, as a earlier attempt with 22.04 failed, not with a disabled wait state, but due to console interface problems in zPDT, at the stage where one has to configure the network interface. I might have to perform the install under VM on zPDT. A test for this 0C4-3B on real iron I'll also try afterwards.

XRDP on regular PC Ubuntu's works just fine, I use it on all of my Ubuntu boxes, which are actually LXD Ubuntu, i.e Lubuntu. Nowadays, it merely needs a regular "xrdp" install :

sudo apt-get install xrdp

In the past that was a bit more involved, notably the s390x LXD Ubuntu 18.04.1 under Hercules some years ago. When I've finished my installation under zPDT, I'll send you my installation notes directly.

Cheers,

Peter

Fish-Git commented 1 year ago

One thing I noticed myself after installing xrdp on my Hercules Ubuntu 18.04 system was, an operation exception on a Vector instruction:

12:20:05.836 HHC00801I Processor CP00: Operation exception interruption code 0001 ilc 6
12:20:05.837 HHC02269I CP00: R0=0000000000000000 R1=000002AA0B276390
12:20:05.837 HHC02269I CP00: R2=000002AA0B276390 R3=0000000000001740
12:20:05.837 HHC02269I CP00: R4=000003FF9EF29506 R5=0000000000000000
12:20:05.837 HHC02269I CP00: R6=000002AA0B2258E0 R7=000002AA0B2258E0
12:20:05.838 HHC02269I CP00: R8=000003FFF01FECE8 R9=000003FFF01FCA40
12:20:05.838 HHC02269I CP00: RA=000003FF9EF58C60 RB=000003FFA4B38938
12:20:05.838 HHC02269I CP00: RC=000003FFA4B304E0 RD=000003FFA4B38110
12:20:05.838 HHC02269I CP00: RE=000003FF9FD97C74 RF=000003FFF01FC900
12:20:05.838 HHC02271I CP00: C0=0080000014846A10 C1=000000015852C1C7
12:20:05.838 HHC02271I CP00: C2=0000000000011140 C3=0000000000000000
12:20:05.838 HHC02271I CP00: C4=000000000000FFFF C5=0000000000011140
12:20:05.838 HHC02271I CP00: C6=0000000010000000 C7=000000016CA281C3
12:20:05.838 HHC02271I CP00: C8=0000000000000000 C9=0000000000000000
12:20:05.838 HHC02271I CP00: CA=0000000000000000 CB=0000000000000000
12:20:05.838 HHC02271I CP00: CC=0000000000000000 CD=0000000000ED0007
12:20:05.839 HHC02271I CP00: CE=00000000DB000000 CF=0000000000011280
12:20:05.839 HHC02324I CP00: PSW=0705000180000000 000003FF9FD97C74 INST=E70000000044 ????? ,                      ?
12:20:05.839 HHC02326I CP00: V:0000000000000000: Translation exception 003B (Region-third-translation exception)
12:20:05.839 HHC02326I CP00: V:0000000000000044: Translation exception 003B (Region-third-translation exception)

E7 is a z/Arch Vector instruction, which Hercules doesn't support yet. (If you try enabling bit 129, the IPL goes into a hard wait.)

(Note: the above 003B Region-third-translation exceptions are red herrings (false leads). They're displayed by the function that prints the instruction and occur because it's unable to access the operand data. The actual exception itself was not a 3B, but rather a 01 Operation exception.)

I can't remember exactly when the above Operation Exception occurred. Did it happen on its own? Or did it happen whenever I tried connecting with RDP? I can't remember! Sorry.

The only thing I remember is after connecting/logging in via RDP (x11rdp?) is getting a blank screen. It looks like a desktop, but with no icons or task bar. Just a blank desktop. Clicking anywhere does nothing.

But then that's probably because I suspect my Ubuntu was installed without any GUI support. (Someone else set it up for me and just sent me the Herc dasds and config file.)

Maybe if I try installing it myself, making sure GUI support is specified, and then install xrdp afterwards and try again, the results might be different? I'll have to try that when I get a chance.

Does RDP otherwise work okay for you? Did you install GUI support when you installed/setup your Ubuntu 18.04?

I'm beginning to suspect that doing sudo apt-get install xrdp ends up installing the latest and greatest version of xrdp, which likely presumes the z/Arch Vector Facility support to be available. (The install took a long time, and installed /many/ things! 441MB of stuff! Sheesh!)

Do you know if there is a way to install the xrdp package that existed when Ubuntu 18.04 was first released? Is there maybe a place somewhere where you can download an older version of it?

Peter-J-Jansen commented 1 year ago

I can confirm your findings re. the operation exception and the 003B exception then being a red herring, and that it occurs, reproducibly so, when trying to RDP connect.

My understanding is that for XRDP to function, the Ubuntu indeed needs some GUI support, which is in my case LXDE (i.e. Lubuntu). Of course with the broken xrdp, it's no longer usable under Hercules.

The xrdp version installed are Ubuntu version specific. For Ubuntu 1804 that's xrdp 0.9.5-2. Newer Ubuntu's, i.e. 20.04 or 22.04 have different, higher xrdp version numbers. I was unable to find the xrdp 0.9.5-2 predecessor, trying to circumvent what is clearly a s390x xrdp build error, probably because at build time the GCC (or whatever compiler) version was used with the wrong s390x options for 18.04, i.e. the m=vx option (which is OK for the later Ubuntu versions as these specifically require the vector stuff). One could try to report this as an error against xrdp 0.9.5-2 s390x, but we're coming close to the 5-year support for 18.04 TLS, so I'm doubtful it will bring anything.

In short, the only positive is that this xrdp failure is NOT a Hercules bug. It also confirms that my original suspicion that the 003B exceptions I observed in z/OS were indeed caused by me erroneously not disabling the vector instructions.

It's also another argument in favor of working on adding vector instructions to Hercules.

I suggest we close this issue. OK?

Cheers,

Peter

Fish-Git commented 1 year ago

My understanding is that for XRDP to function, the Ubuntu indeed needs some GUI support,

Which is why I'm considering trying to reinstall my Ubuntu 18.04 system on Hercules, but this time with GUI support specified.

UNLESS.... it's possible to install GUI support after the fact? (i.e. after the operating system has already been installed?) Do you know if that's possible? That would probably be a lot faster than reinstalling the entire operating system again from scratch. Do you know if it's possible to do that?

For Ubuntu 1804 that's xrdp 0.9.5-2.   [...]   I was unable to find the xrdp 0.9.5-2 predecessor,

We don't need a predecessor. (Do we?) We only need 0.9.5-2. Yes? Were you able to find that version anywhere? Or the source for it? Maybe we can re-build it without the -mvx option and then try installing the result?

Yes, I know this is becoming quite complicated/involved just to research such a seemingly minor issue. Probably more trouble than it's worth. But I really prefer trying to prove things conclusively when possible, rather than just reaching a "presumed" (unconfirmed/unproven) conclusion. But I readily admit there is a definite cost-benefit limit to such efforts, and this issue is dangerously close to exceeding that limit.

... trying to circumvent what is clearly a s390x xrdp build error, ...

Oh? Has that been conclusively determined yet? I agree that, given the nature (symptoms) of the bug, that it is very likely a bug in xrdp, but have our tests proven that yet? The only thing that we have established so far is that it fails on Hercules with 18.04, and works fine on "real hardware" with 22.04. But unless I missed something, what we have not determined yet is whether or not it works on real hardware with 18.04. Only once we determine that will we then know for certain whether it's a Hercules bug or not.

You mentioned in an offline email that you were successful in getting xrdp working on Unbuntu 22.04 on "real hardware", but I'm more interested in whether it works or not with 18.04 on real hardware.

In short, the only positive is that this xrdp failure is NOT a Hercules bug.

Again, how did you determine this?

It's also another argument in favor of working on adding vector instructions to Hercules.

Wholeheartedly agreed! Which is proving to be quite problematic for me, as it's looking like it's going to require an almost complete re-write of our existing floating point support (float.c, dfp.c, ieee.c, etc) due to the way floating point registers are currently being accessed.   :(

I suggest we close this issue. OK?

But are we done yet? I don't think we are. We need to try Ubuntu 18.04 on real hardware first, don't we? Only then will we know for certain whether the "bug" is in xrdp or Hercules. Only then will we know whether xrdp requires (or presumes) Vector Facility support or not. At this point, we are simply presuming it does. But unless I've missed something, we haven't proven that yet. Have we?

mcisho commented 1 year ago

I suggest that the problem isn't necessarily anything to do with any particular version of xrdp, or even any particular version of Ubuntu.

I had a similar problem With Fedora (see issue #367) where, following an update, Fedora would abend during startup. The problem turned out to be that a change to the libc package was assuming that the presence of Miscellaneous-Instruction-Extensions Facility 3 (facility bit 61) meant that Vector Facility for z/Architecture (facility bit 129) was also present, and so Vector instructions were executed.

In my case adding a FACILITY DISABLE 061_MISC_INSTR_EXT_3 statement to the Hercules config avoided the problem. Perhaps it might in this case?

Fish-Git commented 1 year ago

In my case adding a _FACILITY DISABLE 061_MISC_INSTR_EXT3 statement to the Hercules config avoided the problem. Perhaps it might in this case?

Great suggestion! Thanks, Ian!

Unfortunately however, it doesn't seem to help any in this particular case; the exact same abend still occurs.  :(

That was with the original 18.04 version. I also downloaded 18.04.1, 18.04.2, 18.04.3 and 18.04.4 too though, so I could try each of them in turn to see if it makes any difference with any of them, but I'm not exactly excited about doing so due to how incredibly long it takes to perform an install. (It takes an entire freaking day!)

However, whenever I login, I notice that it reports:

670 packages can be updated.
470 updates are security updates.

so I'm thinking a more worthwhile test to perform might be to apply all of the available updates first. Either that or maybe jumping right to trying version 18.04.4 next. If it doesn't work any better, even after applying all updates, then I don't think anything is going to work.  :(

I sure wish we could try this version of Ubuntu on real hardware!  :(

Fish-Git commented 1 year ago

I sure wish we could try this version of Ubuntu on real hardware!  :(

(Oops!) I forgot about the offline email I received from you where you reported xrdp worked just fine on 18.04 on the z/PDT (which is close enough to "real hardware" for our purposes).

So I guess the next test is to apply all the updates to see if that fixes the problem or not. I'm about to do that now. I'll let you know how it goes.

Fish-Git commented 1 year ago

So I guess the next test is to apply all the updates to see if that fixes the problem or not. I'm about to do that now. I'll let you know how it goes.

I changed my mind.

Instead of trying again with the s390x Ubuntu 18.04.0 that I installed, I instead did a brand new install of s390x Ubuntu 18.04.3, following the same procedure that you (Peter) documented in your offline email to me, and it worked just fine! I was able to connect via RDP just fine!

s390x-Ubuntu-18 04 3-RDP

No error occurred. There was no Region 3rd Translation Exception.

So I am satisfied that this is definitely not a Hercules bug, but is rather an obvious xrdp "bug" in specific version(s) of s390x Ubuntu.

I use "bug" within quotation marks because it might not actually be a bug per se, but rather might possibly be a design decision. It might well be that the Ubuntu developers decided that xrdp should always use zVector Facility instructions which is why it doesn't work on later versions of s390x Ubuntu on Hercules. Or it could actually be a legitimate bug in xrdp that wasn't discovered and fixed until a later version.

Either way, it's definitely not a bug in Hercules, so I am happy to now close this issue as you originally suggested Peter. I apologize for the delay, but I wanted to be sure for myself.

Since I'm not sure how to categorize this issue, I am closing this issue with the labels "Invalid" and "Unknown" and "Won't Fix" too (i.e. all three labels).

Thanks, and I appreciate your patience with this sometimes stubborn and skeptical fellow developer.  :)  

Peter-J-Jansen commented 1 year ago

Thanks Fish for confirming all this, and for reporting that even today one can still install a "back-level" Ubuntu 18.04.3 where xrdp works.

At he same time I can also confirm that unfortunately Ian's workaround with "facility disable 61" does not work on Ubuntu 18.04.6 ; xrdp still exhibits the problem as reported in this Issue.

Cheers,

Peter