SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
242 stars 91 forks source link

Performance issue in z/Arch architecture mode (visible under SIE) #222

Closed ivan-w closed 5 years ago

ivan-w commented 5 years ago

Running the actual same REXX program under CMS (ESA architecture) and z/CMS (ESAME) yield a more than 2 times performance degradation.

******************
i zcms
z/CMS V6.3.0   2018-08-17 16:08

DMSACP723I T (592) R/O
Ready; T=0.07/0.09 02:42:57
toto
Ready; T=69.01/69.14 02:44:09
i cms
z/VM V6.3.0    2018-08-17 16:08

DMSACP723I T (592) R/O
Ready; T=0.04/0.05 02:44:24
toto
Ready; T=29.52/29.58 02:44:55
******************

  toto exec a:

/* rexx */               
Z=0                
DO I = 1 TO 1000000
  Z=Z+I            
END 

A completely trivial test...

I will investigate this further...   (is it a SIE issue? is it a z/Arch issue?)

Juergen-Git commented 5 years ago

Just gave your REXX program a try under zPDT, on which only a 2.5% performance degradation can be seen for z/CMS. However, the whole thing is a factor 10 faster there, so possibly the observation is a no-brainer and I'm only seeing JIT artifacts.

If it's not JIT related, the probability is high that there are one or more instructions (prominently SIE, but also any z/Arch instruction that is frequently used by REXX on z/CMS, or by z/CMS itself) not implemented optimally on Hercules.

It would be interesting to see, how this behaves on real iron... but I don't currently have access to a real z machine.

ivan-w commented 5 years ago

If z/PDT (which is in my reasoning) an actual IBM z/Arch machine shows only a minor degradation, then it's possible we're missing something in hercules (when running in z/Arch architecture under SIE)... I probably need to determine if we have a specific instruction that is slowing down things horribly!

mcisho commented 5 years ago

On a z13 (unnecessary messages removed):

i zcms
z/CMS V6.4.0   2018-08-16 12:31
Ready; T=0.01/0.01 06:19:32
toto
Ready; T=0.66/0.66 06:19:37

i cms
z/VM V6.4.0    2018-08-16 12:31
Ready; T=0.01/0.01 06:19:55
toto
Ready; T=0.65/0.65 06:19:59

i zcms
z/CMS V6.4.0   2018-08-16 12:31
Ready; T=0.01/0.01 06:20:16
toto
Ready; T=0.65/0.65 06:20:19

i cms
z/VM V6.4.0    2018-08-16 12:31
Ready; T=0.01/0.01 06:21:33
toto
Ready; T=0.65/0.65 06:21:37
ivan-w commented 5 years ago

Thanks! So we're obviously doing something wrong!

mcisho commented 5 years ago

Would you like an instruction trace? With the do loop two times the trace is 23K lines.

ivan-w commented 5 years ago

I can do the instruction trace myself with z/VM 6.3 running z/CMS under hercules. ;) (it shouldn't be that different from a z/VM 6.4 z/CMS)

Fish-Git commented 5 years ago

FWIW, here's what Mike Cowlishaw's REXXCPS EXEC reports for z/OS 2.3 under Herc, z/VM 6.3 under Herc (for both CMS and ZCMS), from Hercules itself and from Windows.

What's interesting perhaps, is the age of the corresponding Rexx interpreter each uses! (and the resulting speed difference).

But yeah, you're right, Ivan: for some as-yet-unknown reason, ZCMS performance really sucks, being about 50% slower than CMS. I wonder why?

********************************************************************************
From z/OS 2.3 TSO option 6:

             ----- REXXCPS 2.2 -- Measuring REXX clauses/second -----
              REXX version is: REXX370 3.48 01 May 1992
                    System is: TSO
                    Averaging: 10 measures of 30 iterations

                  Performance: 23742 REXX clauses per second

********************************************************************************
From z/VM 6.3:

IPL CMS:

             ----- REXXCPS 2.2 -- Measuring REXX clauses/second -----
              REXX version is: REXX370 4.02 01 Dec 1998
                    System is: CMS
                    Averaging: 10 measures of 30 iterations

                  Performance: 65921 REXX clauses per second

IPL ZCMS:

             ----- REXXCPS 2.2 -- Measuring REXX clauses/second -----
              REXX version is: REXX370 4.02 01 Dec 1998
                    System is: CMS
                    Averaging: 10 measures of 30 iterations

                  Performance: 30320 REXX clauses per second

********************************************************************************
From Hercules (exec "scriptname"):

17:19:18.702 ----- REXXCPS 2.2 -- Measuring REXX clauses/second -----
17:19:18.702  REXX version is: REXX-ooRexx_4.2.0(MT)_64-bit 6.04 22 Feb 2014
17:19:18.702        System is: WindowsNT
17:19:18.703        Averaging: 10 measures of 30 iterations
17:19:18.770 
17:19:18.770      Performance: 4545455 REXX clauses per second

********************************************************************************
From Windows command line:

             ----- REXXCPS 2.2 -- Measuring REXX clauses/second -----
              REXX version is: REXX-ooRexx_4.2.0(MT)_64-bit 6.04 22 Feb 2014
                    System is: WindowsNT
                    Averaging: 10 measures of 30 iterations

                  Performance: 5000000 REXX clauses per second

********************************************************************************
ivan-w commented 5 years ago

Yeah.. Let's leave that one open for the time being. I'll finish what I am currently doing (get rid of all the warnings in GCC, some code cleanup, minor refactoring, etc...) Doing it all in very small chunks so if any problem crops up due to my changes, they'll be easy to spot using git bisect.

ivan-w commented 5 years ago

PS: It may not be a z/SIE issue but a z/Arch problem (the z/OS TSO test is in the same order of magnitude as ZCMS). Were you using the same host system for z/OS and z/VM? Or was it z/OS under z/VM?

Fish-Git commented 5 years ago

Were you using the same host system for z/OS and z/VM?

Yes, of course! Hercules on Windows.

Or was it z/OS under z/VM?

No, of course not! z/OS native and z/VM native. (under Hercules of course!)

ivan-w commented 5 years ago

So apparently it's a SIE related issue. ;) That kind of narrows down the issue! Thanks!

ivan-w commented 5 years ago

NOT a SIE related issue! DUH!

Fish-Git commented 5 years ago

... the probability is high that there are one or more instructions (prominently SIE, but also any z/Arch instruction that is frequently used by REXX on z/CMS, or by z/CMS itself) not implemented optimally on Hercules.

I could be wrong but I'm doubting it's SIE.

Instead, I suspect it's more than likely one or more z/Arch instructions that are simply coded inefficiently in Hercules.

But that's just a pure 100% guess. I really have no idea!

ivan-w commented 5 years ago

I agree... We may be doing something stoopid (we aren't past that) in z/Arch that flew under the radar because, although it is correct, it may be utterly inefficient!

Fish-Git commented 5 years ago

@mcisho Ian,

Would you like an instruction trace? With the do loop two times the trace is 23K lines.

Do you still have your trace? I would personally like to see it! Or I can do it myself if you can tell me how. (It's been so long since I've done any type of VM debugging that I've forgotten how! Any help you can give me would be greatly appreciated. Thanks.)

mcisho commented 5 years ago

Hi Fish, no I don't have the original trace any more, but I've done another. It's slightly different, I turned the trace on and off in the rexx, so there are <12K lines. The zip contains the exec and the trace.

mcisho commented 5 years ago

My apologies, I had forgotten to include the zcms version of the trace in the zip. The new zip contains the exec and the cms and zcms traces.

Fish-Git commented 5 years ago

The new zip contains the exec and the cms and zcms traces.

Thanks! I'll take a look at it as soon as I can.

(I kind of thought it was as simple as that, but it's been so feaking long since I've worked at any detailed level in VM that I had forgotten how! Good to know it was as easy as it was. Thanks.)

Fish-Git commented 5 years ago

Maybe if we update the Rexx interpreter on each of the two platforms (z/OS and z/VM, but especially z/VM since that's where the problem appears to be) the problem might go away? (or not be as bad?)

Found this of IBM's site:

It wasn't until the year 2000 that IBM came out with 64-bit support after all! A newer version of their Rexx interpreter might have much more efficient z/Architecture support thereby greatly reducing the performance difference between the two? (32-bit vs. 64-bit)

I have no friggin' idea how to go about obtaining such an upgrade or how to install it, but maybe one of you do? Maybe it's worth a shot?

ivan-w commented 5 years ago

Fish

The REXX compiler is a separate product. It's not part of stock z/VM

The fact that a CMS REXX (S/390 Arch) program and the same ZCMS (z/Arch) REXX program runs twice as slow is not the actual problem.. It's a symptom of an underlying problem I'd like to resolve. The REXX interpreter is the same for CMS and ZCMS (even if you are in z/Arch mode it's running AMODE 31).

--Ivan

ivan-w commented 5 years ago

Thinking out loud here...

I am saying is that there is a possibility some instruction/instructions (or the underling facilities that are being used by those instructions) are running significantly slower in z/Arch mode than in S/390 mode EVEN when the addressing mode is 31 bit.

At first, I was thinking a TLB/MMU/DAT issue but it probably isn't since when running under SIE, even if the guest is S/390, the host is z/Arch (where the actual translation occurs).

Then I thought about registers.. (32 vs 64 bit) - but IIRC, even in S/390 mode, the registers in the REGS structure are 64 bits anyway (the upper word just not being accessible in S/390 mode).

There must be a special case slowing down stuff when running z/Arch mode.. it may or may not be restricted to AMODE 31.

Fish-Git commented 5 years ago

EDITOR'S NOTE: the below comment is being retained for historical purposes only. The reason it is crossed out is because it is in error. This is explained in a subsequent comment by the original author.


@ivan-w @Juergen-Git @mcisho

Ivan, Jürgen, Ian, (or anyone else!)

Can any of you please explain the following to me? Because it doesn't make any sense to me!

Purely out of curiosity, I inserted some test code into the logical_to_main_l function in dat.h, that counts the number of TLB hits and misses for both the z/VM guest and z/VM itself (based on the regs->sie_mode flag).

If you look, at the beginning of logical_to_main_l, the very first thing it does is the following test:

https://github.com/SDL-Hercules-390/hyperion/blob/287d2d40d5f4a7a6ce05954c37f0f3ebfc34cb8b/dat.h#L2189-L2203

and then, if the condition is TRUE, retrieves and uses the previously translated values from the TLB, -OR-, if the condition is FALSE, does a full address translation (by calling the translate_addr function):

https://github.com/SDL-Hercules-390/hyperion/blob/287d2d40d5f4a7a6ce05954c37f0f3ebfc34cb8b/dat.h#L2204-L2220

This is where I inserted my test code to count how often we did a full translation (i.e. a TLB miss) as opposed to using the previously translated dat values (i.e. a TLB hit).

The counts I'm getting for when I run Ivan's dirt simple pure-cpu-bound toto exec under ipl cms as opposed to running it under ipl zcms don't make any sense to me! Something very weird is going on somewhere! (either that or I'm just not understanding something correctly)

Please see the attached patch and test run:

Within the above .zip file is the patch containing my test code, and the resulting Hercules log file from a test run I just did. You'll see the test display near the very end of the log file.

The cms test run ran for about 21 seconds and displayed the following counts:

02:14:44.467 HHC00001D
02:14:44.467 HHC00001D HOST:              171 hits
02:14:44.467 HHC00001D HOST:        1,164,223 misses
02:14:44.468 HHC00001D HOST:        1,164,394 TOTAL
02:14:44.468 HHC00001D
02:14:44.468 HHC00001D GUEST:          97,909 hits
02:14:44.468 HHC00001D GUEST:               0 misses
02:14:44.468 HHC00001D GUEST:          97,909 TOTAL
02:14:44.469 HHC00001D

(The "HOST" counts are for regs->sie_mode == 0, and the "GUEST" counts are for regs->sie_mode == 1; see my attached patch)

Then I did ipl ZCMS and did my test (which took 2 minutes and 35 seconds, 7 times longer!), which displayed the following counts:

02:17:01.123 HHC00001D
02:17:01.123 HHC00001D HOST:              383 hits
02:17:01.123 HHC00001D HOST:        2,612,496 misses
02:17:01.124 HHC00001D HOST:        2,612,879 TOTAL
02:17:01.124 HHC00001D
02:17:01.124 HHC00001D GUEST:   1,075,351,149 hits
02:17:01.124 HHC00001D GUEST:               0 misses
02:17:01.125 HHC00001D GUEST:   1,075,351,149 TOTAL
02:17:01.125 HHC00001D

WTF is going on?!

Why, when running ZCMS is the host (z/VM itself) doing twice as many calls to logical_to_main_l (99.9% of which are TLB misses!) and the guest doing 11,000 times(!) as many logical_to_main_l calls, all of which are TLB hits!   (WTF?!)

HELP!

I'm confused!   :(

ivan-w commented 5 years ago

Just a side note:

It's not z/VM that's doing the translation per-se, it's the Hercules DAT code that detects a SIE guest REGS, invokes the Guest REGS translation process and THEN invokes the translation process of the underlying REGS (the host) - thus creating a shadow table. The fact there are only hits under CMS or ZCMS is probably because CMS/ZCMS always run with (guest) DAT off.

As for the rest, I'll look because it's a good point where to start!

--Ivan

ivan-w commented 5 years ago

PPS: There is nothing in SIE that prevents SIE from being invoked in a recursive way! Hercules has a limit of 1.. IBM manufactured systems have a limit of >=2.. But as far as storage is concerned, Hercules doesn't put a limit.

But as you said.. from your previous tests, the problem is not SIE in itself, but probably more something to do with z/Arch DAT/MMU which is not "broken" but possibly highly ineffective!

Fish-Git commented 5 years ago

@ivan-w @Juergen-Git @mcisho

NEVER MIND!

My test is bogus!

My code is in the wrong place!

(I didn't study the code enough; it was late and I was tired; sorry!)

EVERY call to logical_to_main_l is a TLB miss!

So the totals you see in my test run (i.e. all totals combined) are all for TLB misses!  NONE of them are for TLB hits!

The actual TLB logic (TLB lookup) is in the MADDRL macro, not in the logical_to_main_l function!

It is the MADDRL macro that checks the TLB fields, and if all conditions are met for a hit, returns the corresponding mainstor address via the MAINADDR macro.

If any of the conditions fail (i.e. a TLB miss), then logical_to_main_l is called to perform a full address translation.

(Doh!)  Sorry for the false alarm folks!  My bad. :(

I'm going to now take a closer look at the MADDRL macro code just for my own education purposes. It may be a wild goose chase as far as this issue is concerned, but I really want to better understand Greg's whole TLB lookup handling. Maybe we might be able to improve upon it some?

ivan-w commented 5 years ago

It is.. But then again it isn't! Your test seems to have uncovered that there is something awfully wrong in the MADDRL macro.

For those watching and not familiar with the intricacies of Hercules MMU:

MADDRL is the macro that allows one to get the actual memory address on the computer running Hercules from a virtual emulated address, from any level of virtualisation, all the way down.

MADDRL checks the TLB directly and usually returns quite fast by doing a few checks on PSW flags, TLB entries, etc then is done... most of the time... in usually less than 100 instructions.

The logical_to_main_l function does the same but takes the long route - basically when no TLB entry is eligible. That can take thousands if not millions of instructions on the host computer to complete, because of all the intricacies of the S/370, S/390 and z/Arch MMU logic (not to mention going down to SIE). Of course this is usually a one time shot.. When the address is resolved, the TLB entry is set up and the next access (to that page) will be a direct access by MADDRL (until the next PSW key change, call to PTLB, etc).

The TLB however has a limited number of entries (it doesn't cover the entire address space), so it only matches a certain portion of the address (the page number). So one could construct a program to fool the TLB accelerator (accessing in succession the same page number but on different segments or regions), but that'd be by design - and quite unrealistic.

ivan-w commented 5 years ago

Did a bit of reshuffle of the DAT code.. There were some inconsistencies.

Fish-Git commented 5 years ago

The new zip contains the exec and the cms and zcms traces.

Thanks! I'll take a look at it as soon as I can.

Thank you, Ian.

I took some time the other day to take a close look at the instruction traces you gave me, but could not see anything unusual to explain the unusual slowdown seen when running toto exec under zcms as opposed to running under cms. There were very few out of the ordinary instructions being executed in either trace from what I could see. The traces showed that virtually all of the instructions were quite uninteresting ordinary instructions, none of which appear to be inefficiently coded.

So a popular but poorly optimized instruction being emulated by Hercules is definitely not the cause of the slowdown under z/Arch (zcms) that we're seeing.

It has to be something else.

ivan-w commented 5 years ago

I have a solution.. the problem is I am not sure what happened ! So I won't commit just yet until I figure it out.

ivan-w commented 5 years ago

Can someone please review this for any logic error ?

Just trying to split the logic into individual components. The compiler optimizes all this out anyway.

diff --git a/dat.h b/dat.h
index 7cdd614a..6c95e80d 100644
--- a/dat.h
+++ b/dat.h
@@ -54,9 +54,17 @@ static inline    BYTE    *ARCH_DEP(maddr_l)(VADR _addr,size_t _len,int _arn,REGS
     if(likely((_regs)->AEA_AR((_arn))))
     {
         /* More Access register comment here required */
-        if(likely(((_regs)->CR((_regs)->AEA_AR((_arn))) == (_regs)->tlb.TLB_ASD(TLBIX(_addr)))
-                    || ((_regs)->AEA_COMMON((_regs)->AEA_AR((_arn))) & (_regs)->tlb.common[TLBIX(_addr)])
-                ))
+        int c1;
+        int c2;
+        int arc1_c1;
+        int arc1_c2;
+
+        arc1_c1=(_regs)->CR((_regs)->AEA_AR((_arn)));
+        arc1_c2=(_regs)->tlb.TLB_ASD(TLBIX(_addr));
+        c1=arc1_c1==arc1_c2;
+
+        c2=((_regs)->AEA_COMMON((_regs)->AEA_AR((_arn))) & (_regs)->tlb.common[TLBIX(_addr)]);
+        if(likely(c1 || c2))
         {
             /* Protection key check */
             if(likely((_akey) == 0 || (_akey) == (_regs)->tlb.skey[TLBIX(_addr)]))
ivan-w commented 5 years ago

I'm getting this now : i zcms
z/CMS V6.3.0 2018-08-17 16:08

DMSACP723I T (592) R/O
Ready; T=0.04/0.05 08:31:45
toto
Ready; T=31.79/31.87 08:32:23
i cms
z/VM V6.3.0 2018-08-17 16:08

DMSACP723I T (592) R/O
Ready; T=0.04/0.05 08:32:31
toto
Ready; T=30.49/30.56 08:33:04

***** VERY MINOR difference...

ivan-w commented 5 years ago

I can't commit this yet because there was no "apparent" logic change BUT the problem went away.. "APPARENT" is the key word.. I must've changed something in the logic and it might break something somewhere else, so it needs thorough review. However, this provides a nice hint as to where the problem lies (yes - it's in the DAT logic)

Fish-Git commented 5 years ago

@ivan-w

Ivan,

I suspect you might find the following to be VERY interesting!

What I did was convert the MADDRL macro into a function, and then added some test code to count how often a TLB hit and TLB miss was occuring, and for the misses, which of the 5 conditions was most frequently causing the miss to occur.  (see the attached patch(*) at the end for details)

  maddrl (previously MADDRL macro) conditions:  

  1. (regs->AEA_AR( arn ))

  2. ( (regs->CR( regs->AEA_AR( arn )) == regs->tlb.TLB_ASD( TLBIX( addr ))) || (regs->AEA_COMMON( regs->AEA_AR( arn )) & regs->tlb.common[ TLBIX( addr ) ]) )

  3. (akey == 0 || akey == regs->tlb.skey[ TLBIX( addr )])

  4. (((addr & TLBID_PAGEMASK) | regs->tlbID) == regs->tlb.TLB_VADDR(TLBIX(addr)))

  5. (acctype & regs->tlb.acc[ TLBIX( addr ) ])

  I then ran your toto exec test from a z/VM guest machine after both ipling cms (S/390 mode) as well as after ipling zcms (z/Arch mode). The results are below:

--------------------------------------------------------------------------------

z/VM 6.3, toto exec, S/390 guest (ipl cms)

HHC01603I $test fish start
HHC00001D fish: start!
HHC01603I $test fish stop
HHC00001D fish: stop!
HHC00001D
HHC00001D HOST:        8,187,873 HITS   ( 84.06%)
HHC00001D HOST:        1,552,716 MISSES ( 15.94%)
HHC00001D HOST:        9,740,589 TOTAL
HHC00001D                    170 times cond 1 was false (  0.01%)
HHC00001D                 11,339 times cond 2 was false (  0.73%)
HHC00001D                 38,970 times cond 3 was false (  2.51%)
HHC00001D                  7,979 times cond 4 was false (  0.51%)
HHC00001D              1,520,651 times cond 5 was false ( 97.93%)
HHC00001D                      0 times cond 6 was false (  0.00%)
HHC00001D                      0 times cond 7 was false (  0.00%)
HHC00001D                      0 times cond 8 was false (  0.00%)
HHC00001D                      0 times cond 9 was false (  0.00%)
HHC00001D
HHC00001D GUEST:   1,106,518,431 HITS   ( 99.99%)
HHC00001D GUEST:         133,735 MISSES (  0.01%)
HHC00001D GUEST:   1,106,652,166 TOTAL
HHC00001D                      0 times cond 1 was false (  0.00%)
HHC00001D                      0 times cond 2 was false (  0.00%)
HHC00001D                    707 times cond 3 was false (  0.53%)
HHC00001D                110,539 times cond 4 was false ( 82.66%)
HHC00001D                132,114 times cond 5 was false ( 98.79%)
HHC00001D                      0 times cond 6 was false (  0.00%)
HHC00001D                      0 times cond 7 was false (  0.00%)
HHC00001D                      0 times cond 8 was false (  0.00%)
HHC00001D                      0 times cond 9 was false (  0.00%)
HHC00001D

--------------------------------------------------------------------------------

z/VM 6.3, toto exec, z/Arch guest (ipl zcms)

HHC01603I $test fish start
HHC00001D fish: start!
HHC01603I $test fish stop
HHC00001D fish: stop!
HHC00001D
HHC00001D HOST:       16,776,030 HITS   ( 83.83%)
HHC00001D HOST:        3,235,888 MISSES ( 16.17%)
HHC00001D HOST:       20,011,918 TOTAL
HHC00001D                    128 times cond 1 was false (  0.00%)
HHC00001D                 19,033 times cond 2 was false (  0.59%)
HHC00001D                 81,326 times cond 3 was false (  2.51%)
HHC00001D                  6,935 times cond 4 was false (  0.21%)
HHC00001D              3,177,291 times cond 5 was false ( 98.19%)
HHC00001D                      0 times cond 6 was false (  0.00%)
HHC00001D                      0 times cond 7 was false (  0.00%)
HHC00001D                      0 times cond 8 was false (  0.00%)
HHC00001D                      0 times cond 9 was false (  0.00%)
HHC00001D
HHC00001D GUEST:          48,836 HITS   (  0.00%)
HHC00001D GUEST:   1,085,513,771 MISSES (100.00%)
HHC00001D GUEST:   1,085,562,607 TOTAL
HHC00001D                      0 times cond 1 was false (  0.00%)
HHC00001D          1,085,509,921 times cond 2 was false (100.00%)
HHC00001D                  1,215 times cond 3 was false (  0.00%)
HHC00001D                214,320 times cond 4 was false (  0.02%)
HHC00001D            182,085,121 times cond 5 was false ( 16.77%)
HHC00001D                      0 times cond 6 was false (  0.00%)
HHC00001D                      0 times cond 7 was false (  0.00%)
HHC00001D                      0 times cond 8 was false (  0.00%)
HHC00001D                      0 times cond 9 was false (  0.00%)
HHC00001D

--------------------------------------------------------------------------------

I'm not sure why condition 2 is failing (virtually all the time!), but that will definitely be my next goal!!

I'm going to stick some PTT traces in the maddl function to see what values are being passed to the function and what the values in the TLB entry are. Something very weird (or very wrong!) is obviously going on somewhere! :(


(*) NOTE: the attached patch was created before you did your "Move MADDRL Logic from feature.h to dat.h" commit, so it probably won't apply cleanly to current git:

I'll pull your commit and re-work my patch and post a new updated patch in a few minutes(?). I just wanted to get my test results posted as soon as I could since they're so enlightening!

ivan-w commented 5 years ago

Latest tests with my patch:

i cms
z/VM V6.3.0    2018-08-17 16:08

DMSACP723I T (592) R/O
Ready; T=0.04/0.05 08:47:25

rexxcps
----- REXXCPS 2.2 -- Measuring REXX clauses/second -----
 REXX version is: REXX370 4.02 01 Dec 1998
       System is: CMS
       Averaging: 10 measures of 30 iterations

     Performance: 72412 REXX clauses per second

Ready; T=4.10/4.12 08:47:32

 

i zcms
z/CMS V6.3.0   2018-08-17 16:08

DMSACP723I T (592) R/O
Ready; T=0.04/0.06 08:47:36

rexxcps
----- REXXCPS 2.2 -- Measuring REXX clauses/second -----
 REXX version is: REXX370 4.02 01 Dec 1998
       System is: CMS
       Averaging: 10 measures of 30 iterations

     Performance: 72263 REXX clauses per second

Ready; T=4.11/4.13 08:47:44
ivan-w commented 5 years ago

I can't make heads or tail out of this... There was NO change in the logic - and BAM!

I am very hesitant to commit this change because it might contain a change that could affect AR mode in a matter that I am not aware of - and would affect z/OS, DB2 and whatnot.

(z/VM XC mode doesn't seem affected though)

I was only refactoring code!

If I refactor code and behavior changes, I get suspicious, even (and especially!) if it actually (and accidentally!) fixes the problem.

Fish-Git commented 5 years ago

Fascinating...

I'll have to pull your latest commit and try the toto exec tests again for myself too! (as well as a rexxcps test too)   (without my test patch installed of course)

Depending on the results I then might try it again with my test (research) patch installed to see if I still get the same results.

But I think one of us (either you or me!) has screwed up somewhere! Because like you said, your change shouldn't have changed the behavior any!

The fact that it appears it did somehow change Hercules's behavior is indeed, as you said, quite suspicious.

Give me a few(?) minutes and I'll post my own results ....

ivan-w commented 5 years ago

I didn't commit the patch yet - because I don't like committing a patch that is purely a refactoring and seems to alter the logic. Sure, it may seem to fix stuff.. but it's quite likely it's inducing a subtle change somewhere else and I certainly don't like that !

Fish-Git commented 5 years ago

I didn't commit the patch yet...

Ah. Okay. I thought it was your maddl function patch that was causing the new behavior. My bad!

Never mind. :)

ivan-w commented 5 years ago

Nah nah.. It's the refactoring of the 2nd test..The AR test. (not committed but described here). This code change (for some unknown reason) makes S/390 and z/Arch have performance in the same magnitude.

ivan-w commented 5 years ago

I am not going to commit a patch which has effects I don't understand! Even if it seems to accidentally fix a problem!

Fish-Git commented 5 years ago

I am not going to commit a patch which has effects I don't understand! Even if it seems to accidentally fix a problem!

I completely understand and agree 100% with that policy!

HOWEVER... can you share your patch with the rest of us? Maybe one of us might be able to figure out what's going on and maybe that'll be a clue!

ivan-w commented 5 years ago

I already did... it's in the discussion at https://github.com/SDL-Hercules-390/hyperion/issues/222#issuecomment-522229474

Fish-Git commented 5 years ago

I am not going to commit a patch which has effects I don't understand! Even if it seems to accidentally fix a problem!

I completely understand and agree 100% with that policy!

HOWEVER... can you share your patch with the rest of us? Maybe one of us might be able to figure out what's going on and maybe that'll be a clue!

I already did... it's in the discussion at #222 (comment)

(Doh!)  I completely missed those couple of comments of yours between my two comments since it looks like we each posted them at almost the exact same time.  (I hate it when that happens!)

Okay, looking over your patch, your bug appears obvious -- at least it is to me!

-        int c1;
-        int c2;
-        int arc1_c1;
-        int arc1_c2;
+        BYTE c1;
+        BYTE c2;
+        long long int arc1_c1;
+        long long int arc1_c2;

Where is the value (_regs)->CR((_regs)->AEA_AR((_arn))) ultimately coming from? CR! How is a CR defined in z/Arch mode? As a DW! (i.e. 64 bits! Not 32!)

Same thing with (_regs)->tlb.TLB_ASD(TLBIX(_addr)): the TLB_ASD value is a DW, just like CR.

The AEA_COMMON and tlb.common fields however, are simple BYTE values.

I'm guessing the reason you experienced different behavior is because, due to the bug in your patch, you were only comparing the low order 32-bits of the CR and TLB_ASD values, likely causing the comparison to succeed rather than fail had you compared the full 64 bits.

Maybe the root cause of our DAT bug is something similar?

ivan-w commented 5 years ago

You are definitely correct sir!

When changing arc1_c1 and arc1_c2 to U64, the performance is back to what it was before (i.e. dreadful!). Having them as an int was just hiding the problem, not solving it.

But it means we are closing in on the issue! We're going to get that sucka!

Fish-Git commented 5 years ago

We're going to get that sucka!

Too late!   I ALREADY FOUND IT!  :)))

Here's the quick fix(*) that works around the bug:

--- hyperion-1/sie.c    2019-08-07 03:23:15.530798900 -0700
+++ hyperion-0/sie.c    2019-08-18 04:31:20.059206700 -0700
@@ -325,9 +325,19 @@
 #if defined( FEATURE_001_ZARCH_INSTALLED_FACILITY )
     if (STATEBK->mx & SIE_MX_ESAME)
     {
-        GUESTREGS->arch_mode         = ARCH_900_IDX;
-        GUESTREGS->program_interrupt = &z900_program_interrupt;
+        /* If GUESTREGS was initialized for a different architecture,
+           then we need to initialize it for the new architecture */
+        if (GUESTREGS->arch_mode != ARCH_900_IDX)
+        {
+            /* REGS 'arch_mode' MUST be set BEFORE calling cpu_init! */
+            GUESTREGS->arch_mode = ARCH_900_IDX;
+            cpu_init( regs->cpuad, GUESTREGS, regs );

+            /* Now fix the ARCH_DEP bug that's in cpu_init */
+            GUESTREGS->CR_G( CR_ASD_REAL ) = TLB_REAL_ASD_G;
+        }
+
+        GUESTREGS->program_interrupt = &z900_program_interrupt;
         icode = z900_load_psw( GUESTREGS, STATEBK->psw );
     }
 #else /* !defined( FEATURE_001_ZARCH_INSTALLED_FACILITY ) */

We should probably go ahead commit the above workaround (after you and others have had a chance to try it and confirm that it's good first, of course!) so we can at least put this issue behind us and get on down the road.

Afterwards, we can then take our sweet time straightening out and carefully testing the changes that are going to obviously need to be made to cpu.c and any other similarly incorrectly coded modules.


(*) The actual bug is the overall structure/design of the cpu.c module itself! It's all wrong! Non-ARCH_DEP functions incorrectly placed in the ARCHDEP section of the module, a non_-ARCH_DEP function (cpu_init) using an ARCH_DEP constant (TLB_REAL_ASD, which is a HUGE no-no!), etc. It's a mess.

I think I've seen similar bugs in other areas of Hercules in the past too (but never got around to reporting them), which is why I created the _archdep_templ.c source member: it illustrates how an ARCH_DEP module should be structured and contains extensive comments pointing out the various requirements that such compiled-multiple-times-once-for-each-build-architecture type modules need to be careful to strictly adhere to and some of the dangers that occur when you fail to do so (one of which is what bit us in this particular case).

ivan-w commented 5 years ago

I'm going to say you should commit this change! This whole thing is related to the inherent intricacies of running multiple architecture at the same time which is what SIE does.

You may as well commit the refactoring of MADDRL with the correct types while you are at it too (as it just makes it easier to read and eventually debug).

ivan-w commented 5 years ago

Looks good to me ! (With both your fix and my refactoring WITH the correct types).

i cms                                                   
z/VM V6.3.0    2018-08-17 16:08                         

DMSACP723I T (592) R/O                                  
Ready; T=0.04/0.06 09:26:12                             
toto                                                    
Ready; T=29.93/29.99 09:26:44                           
i zcms                                                  
z/CMS V6.3.0   2018-08-17 16:08                         

DMSACP723I T (592) R/O                                  
Ready; T=0.04/0.06 09:26:48                             
toto                                                    
Ready; T=31.44/31.51 09:27:21                           
i cms                                                   
z/VM V6.3.0    2018-08-17 16:08                         

DMSACP723I T (592) R/O                                  
Ready; T=0.04/0.05 09:27:27                             
rexxcps                                                 
----- REXXCPS 2.2 -- Measuring REXX clauses/second -----
 REXX version is: REXX370 4.02 01 Dec 1998              
       System is: CMS                                   
       Averaging: 10 measures of 30 iterations          

     Performance: 73564 REXX clauses per second         

Ready; T=4.04/4.06 09:27:35                             
i zcms                                                  
z/CMS V6.3.0   2018-08-17 16:08                         

DMSACP723I T (592) R/O                                  
Ready; T=0.04/0.05 09:27:38                             
rexxcps                                                 
----- REXXCPS 2.2 -- Measuring REXX clauses/second -----
 REXX version is: REXX370 4.02 01 Dec 1998              
       System is: CMS                                   
       Averaging: 10 measures of 30 iterations          

     Performance: 72070 REXX clauses per second         

Ready; T=4.12/4.14 09:27:46                             
ivan-w commented 5 years ago

However.. I'm going to say we need to keep this issue open (for a limited amount of time) until we have people running MVS/XA+ under VM and verify this doesn't cause any problem since AR/DAS and Addressing mode switching is only really used by MVS.. CMS uses its own version (XC especially for the SFS virtual machines[*]), but that seems to be working OK!

[*] SFS is a file sharing technology based on the SQL/DS (now DB2) engine, which was why XC mode was created in the first place.

Fish-Git commented 5 years ago

I'm going to say you should commit this change!

Will do! I'm still running some more of my own tests just to be extra sure it's good. When I'm done (later today or tomorrow) I'll go ahead and commit it.

However.. I'm going to say we need to keep this issue open (for a limited amount of time) until we have people running MVS/XA+ under VM and verify this doesn't cause any problem...

10-4. Seems prudent to me.

wably commented 5 years ago

I confirm these new results, using the toto exec.

However, I also have an application that runs full time in z/Architecture mode with DAT on that creates a S/370 virtual machine for the purpose of running S/370 OS'es in a z/VM virtual machine. The S/370 guest OS does not show any performance increase or decrease after the patches were applied.

This is not a problem at all. I was simply hoping for a freebie performance boost as a result of the patches in this environment as well. In any case, everything seems to be working fine with the patches.