Closed Fish-Git closed 3 years ago
Inserting the following at the beginning of the function after the variable declarations seems to fix the problem:
if (arn >= USE_ARMODE)
{
arn &= 0xF;
aea_arn = regs->AEA_AR( arn );
}
BUT... I'm not sure yet whether that's the correct thing to do or not.
More research is needed.
If anyone else out there has any additional thoughts to share on the subject, I'd be very eager to hear them!
Thanks.
Fixed by commit 7c686d5c59b671b2542a5280f24248b3e7c891b7.
Closing.
Hey Fish (@Fish-Git ),
Sorry I had to reopen this issue. The fix that you made to correct the DAT bug in MVCOS in AR mode still isn't right.
I've been working on my test cases for issue #350 and I started running into problems with the AR mode use of MVCOS. Now, you would be right to first question my test case programming. But I have an ace in the hole. I have access to real iron (a z114). My test case works perfectly on the z114 (in a z/VM 6.4 virtual machine), but fails with the latest SDL Hyperion, just cloned and compiled in the last hour. I'm using SATK to assemble the same exact source that I assemble on the z114.
The failing situation is that the PSW address space control is set to Home, and the MVCOS R0 control bits are set to X'00000041' which means the 1st operand uses the Home mode from the PSW, and the 2nd operand uses AR mode. The access register used in the 2nd operand is AR6, and the ALET=0 (use primary space). On Hercules, the MVCOS moves a literal from the Home space into the operand1 location in the home space (bad). On the z114, MVCOS moves a literal from the Primary space (as per AR6) to the operand1 location in the home space (good).
So this is just a heads up. I will be providing full documentation here shortly. I need to distill it down to the essential failing parts so you can duplicate it.
Regards, Bob
EXCELLENT!!
Not that Hercules contains a bug of course, but rather that your test has uncovered it! (which is precisely why I wanted a test case written)
Please keep me informed!
(and THANK YOU, Bob, for your efforts!)
Ok here is the documentation. Start with the 'readme.pdf' inside the zip file. I've also enclosed the test run from the z114 so you can compare results, as well as two runs on Hercules (one with tracing active). I'm happy to answer any questions, of course.
Regards, Bob
Quick note to let you know I'm not ignoring you, Bob. I just haven't had time yet to properly dig into this issue. Your patience and understanding is appreciated. (and again, I really do appreciate your effort thus far!!)
Fish, not a problem. Take your time. Because of the z114, I have been able to continue working on the MVCOS tests.
Bob
Bob (@wably),
Please try the attached patch:
(inline):
From: "Fish" <fish@softdevlabs.com>
Date: Sun, 24 Jan 2021 16:01:28 -0800
Subject: [PATCH] Fix forced AR mode crash in DAT TLB lookup (try #2)
Hopefully this closes #349 for good this time, and helps #350 effort too!
diff -r -a -x .git -x 'msvc.AMD64.*' -x 'msvc.dllmod.*' -x 'msvc.debug.*' -x '*.suo' -x '*.ncb' -x '*.user' -x '*.htm' -x WORK -x DICTS -x FILES -x 'allTests.*' -x '*.rej' -x '*.orig' -x AutoBuildCount.h -x '*.cmp*' -x '*.comp*' -Nu hyperion-1/dat.c hyperion-0/dat.c
--- hyperion-1/dat.c 2020-12-24 14:37:41.457392400 -0800
+++ hyperion-0/dat.c 2021-01-24 13:20:23.023483300 -0800
@@ -541,10 +541,11 @@
/* Input: */
/* arn Access register number (0-15) to be used if the */
/* address-space control (PSW bits 16-17) indicates */
-/* that ARMODE is the current translation mode. */
+/* that AR-mode is the current translation mode. */
/* An access register number ORed with the special */
-/* value USE_ARMODE forces this routine to use ARMODE */
-/* regardless of the PSW address-space control setting. */
+/* value USE_ARMODE forces this routine to use AR-mode */
+/* address translation regardless of the PSW address- */
+/* space control setting. */
/* Access register 0 is treated as if it contained 0 */
/* and its actual contents are not examined. */
/* Alternatively the arn parameter may contain one */
diff -r -a -x .git -x 'msvc.AMD64.*' -x 'msvc.dllmod.*' -x 'msvc.debug.*' -x '*.suo' -x '*.ncb' -x '*.user' -x '*.htm' -x WORK -x DICTS -x FILES -x 'allTests.*' -x '*.rej' -x '*.orig' -x AutoBuildCount.h -x '*.cmp*' -x '*.comp*' -Nu hyperion-1/dat.h hyperion-0/dat.h
--- hyperion-1/dat.h 2020-12-24 14:38:22.000000000 -0800
+++ hyperion-0/dat.h 2021-01-24 14:40:27.535965500 -0800
@@ -48,8 +48,9 @@
/* USE_HOME_SPACE */
/* USE_ARMODE + access register number */
/* An access register number ORed with the special */
-/* value USE_ARMODE forces this routine to use ARMODE */
-/* regardless of the PSW address-space control setting. */
+/* value USE_ARMODE forces this routine to use AR-mode */
+/* address translation regardless of the PSW address- */
+/* space control setting. */
/* regs Pointer to the CPU register context */
/* acctype Type of access requested: READ, WRITE, INSTFETCH, */
/* LRA, IVSK, TPROT, STACK, PTE, LPTEA */
@@ -74,18 +75,47 @@
(which is many, many instructions)
*/
- int aea_arn = regs->AEA_AR( arn >= USE_ARMODE ? arn & 0xF : arn );
+ int aea_crn;
U16 tlbix = TLBIX( addr );
- BYTE *maddr = NULL;
+ BYTE *maddr = NULL;
- /* Non-zero AEA Access Register number? */
- if (aea_arn)
+ /* Forced AR-mode address translation? */
+ if (arn >= USE_ARMODE)
+ {
+ /* Retrieve ALET from AR */
+ U32 alet = (ARN( arn ) == 0) ? 0 :
+ /* Guest ALET if XC guest in AR mode */
+ (SIE_ACTIVE( regs ) && MULTIPLE_CONTROLLED_DATA_SPACE( GUESTREGS ))
+ ? GUESTREGS->AR( ARN( arn )) :
+ /* If SIE host but not XC guest in AR mode then alet is 0 */
+ SIE_ACTIVE( regs ) ? 0 :
+ /* Otherwise alet is in the access register */
+ regs->AR( ARN( arn ));
+
+LOGMSG(" +++ USE_ARMODE! +++\n");
+ /* Convert ALET to corresponding CR number, if possible */
+ switch (alet)
+ {
+ case ALET_PRIMARY: aea_crn = 1; break;
+ case ALET_SECONDARY: aea_crn = 7; break;
+ case ALET_HOME: aea_crn = 13; break;
+ default: aea_crn = 0; break;
+ }
+ }
+ else
+ {
+ /* Normal PSW-based address translation */
+ aea_crn = regs->AEA_AR( arn );
+ }
+
+ /* Non-zero AEA Control Register number? */
+ if (aea_crn)
{
/* Same Addess Space Designator as before? */
/* Or if not, is address in a common segment? */
if (0
- || (regs->CR( aea_arn ) == regs->tlb.TLB_ASD( tlbix ))
- || (regs->AEA_COMMON( aea_arn ) & regs->tlb.common[ tlbix ])
+ || (regs->CR( aea_crn ) == regs->tlb.TLB_ASD( tlbix ))
+ || (regs->AEA_COMMON( aea_crn ) & regs->tlb.common[ tlbix ])
)
{
/* Storage Key zero? */
@@ -113,6 +143,9 @@
if (acctype & ACC_CHECK)
regs->dat.storkey = regs->tlb.storkey[ tlbix ];
+if (arn >= USE_ARMODE)
+LOGMSG(" +++ USE_ARMODE TLB HIT! +++\n");
+
maddr = MAINADDR(regs->tlb.main[tlbix], addr);
}
}
diff -r -a -x .git -x 'msvc.AMD64.*' -x 'msvc.dllmod.*' -x 'msvc.debug.*' -x '*.suo' -x '*.ncb' -x '*.user' -x '*.htm' -x WORK -x DICTS -x FILES -x 'allTests.*' -x '*.rej' -x '*.orig' -x AutoBuildCount.h -x '*.cmp*' -x '*.comp*' -Nu hyperion-1/esame.c hyperion-0/esame.c
--- hyperion-1/esame.c 2021-01-20 19:25:06.000000000 -0800
+++ hyperion-0/esame.c 2021-01-24 13:20:18.093874700 -0800
@@ -1214,7 +1214,7 @@
n = USE_HOME_SPACE;
break;
case 4: /* Use current addressing mode (PSW bits 16-17) */
- n = r2; /* r2 is the access register number if ARMODE */
+ n = r2; /* r2 is the access register number if AR-mode */
break;
default: /* Specification exception if invalid value for m4 */
n = -1; /* makes compiler happy */
It fixes (hopefully correctly!(*)) your provided failing test case.
Thanks.
*`()`** Please take a look at the patch and tell me whether or not you feel I am doing things correctly. I think I am, but am not sure. In any case, you provided test case now passes (completes successfully).
Note: I have not looked into the incorrect instruction tracing operand storage display issue yet. That's next!
p.s. Ignore the two debugging LOGMSGs. I added them to try and see how often forced AR mode translation was being requested/used by e.g. z/OS in normal circumstances (i.e. during an IPL and z/OSMF startup), and interestingly neither message appeared even a single time. But then I did not try the test case documented in this GitHib Issue yet either, which probably would cause both messages to be issued. How many times, I'm not sure. But suffice to say the specific condition (the specific bug) that my patch fixes is apparently extremely rare.
But then I did not try the test case documented in this GitHib Issue yet either, which probably would cause both messages to be issued. How many times, I'm not sure.
Okay, I just got around to trying it with z/OS 2.3 and the above "//MYJOB ... SCHEDULE STARTBY..." test case, and the "USE_ARMODE!" message appeared 44 times, and the "USEARMODE TLB HIT!" message appeared 22 times (and the originally reported crash did not occur either of course!)_, so I think my patch is probably good.
I'm tempted to commit it as-is (but without the LOGMSGs of course!), but I'd feel better waiting for your approval/blessing/okay before doing so.
Thanks.
Fish,
I applied your patch this morning on a fresh clone of the repo, and it passes all my tests (196 of them, at present). The results now match that of the z114.
Nice job, thanks! Please feel free to commit your patch at your convenience.
Regards, Bob
Fixed (AGAIN! Hopefully for good this time!) by commit e74bec604ee3a46dffe317a4543a49fb216dcc73.
This issue is closely related to Issue #350.
A user has reported that Hercules is crashing for him when the following jobstream is submitted to z/OS 2.2 or greater:
I have been able to reliably reproduce this crash on my Windows system using the latest bleeding edge development version of Hercules, version 4.4.9999.10344-SDL-gf4a8b54b.
HOWEVER... it should be noted that the bug that appears to be causing this crash was introduced into the Hercules code base over 12(!) years ago (by Roger Bowler no less! *`()`**), so the crash should be reproducible by anyone using pretty much any version of Hercules (and any version of z/OS 2.2 or greater of course).
WinDBG
!analyze -v
analysis of the crash dump produces the following:As you can see, Hercules is crashing in the
dat.h
functionARCH_DEP( maddr_l )
(commonly invoked by theMADDR
andMADDRL
macros) in its very first test for a TLB Hit.The reason it is crashing is because the value of
arn
passed into the function has the valueUSE_ARMODE | b1
(i.e. the value 30 in this particular case: USE_ARMODE (which is #defined to 16) "OR"ed with the operand access register number (which in this particular case was 14)).This causes the
ARCH_DEP( maddr_l )
statement:to load a garbage
aea_arn
value due to theaea_ar_struct
array subscript (5 + arn
) used by theAEA_AR
macro being way out of bounds, which of course ends up causing the crash the moment the first TLB Hit condition is reached andregs->CR( aea_arn )
and/orregs->AEA_COMMON( aea_arn )
are evaluated.*`()`** The following two commits introduced the bug:
Revision: ed4d97402623265b30d9f15db1b44fcb5e6e1d91: Author: Roger Bowler
Date: 3/15/2008 5:04:37 PM
Message: Replace ACC_ARMODE by USE_ARMODE for LPTEA
Modified: dat.h
Modified: esame.c
Modified: hconsts.h
Revision: f440c1bcf72faafca18d82055f3c77e6950abfca Author: Roger Bowler
Date: 3/15/2008 5:09:57 PM
Message: Add MVCOS instruction (part 2)
Modified: control.c
Modified: vstore.h
This issue is closely related to Issue #350.