SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

CLCLE performance improvement #497

Closed JamesWekel closed 2 years ago

JamesWekel commented 2 years ago

Fish,

RE: Issue: Performance of some instructions could be significantly improved #101

Here is a rework of the CLCLE instruction based on the prior improvement of the CLCL instruction. The associated tests are also a rework of the CLCL tests for just the CLCLE instruction. I can't believe the performance improvement, so I'm probably missing something that might be obvious to you. Based on three tests before and after the change, the improvement for 1,000,000 executions was:

Before : 4.281 sec
After: .031 sec Improvement: 99.3%

Hope this look ok to you as I am still in learning mode on both the code base and the tools set..

Jim

Fish-Git commented 2 years ago

I can't believe the performance improvement, so I'm probably missing something that might be obvious to you. Based on three tests before and after the change, the improvement for 1,000,000 executions was:

Before : 4.281 sec After: .031 sec Improvement: 99.3%

I can't believe that either! That's a bit too hard to believe!

Let me take a look at it a bit more closely and I'll get back to you.

p.s. Your "Minor correction" commit fixed your typo in the Hercules_VS2022.vcxproj.filters file, but it still exists in the other .vcxproj.filters files (VS2015, VS2017 and VS2019). You'll need to fix that before I can accept your PR. But let me take a closer look at your code first. I'm having a hard time accepting such a dramatic speed improvement!

p.p.s. Your tests/CLCLE-03-performance.tst file is also wrong. To do a speed test (which is the only thing the "CLCLE-03-performance" test is supposed to be doing, yes? That is to say, it's not designed to "test" anything, right? (That's what the other two tests are designed to do.) It's only purpose is to perform a speed test and nothing more, yes?), you need to be setting the flag at 21fd to FF. That's what triggers the speed test. That is to say, the following lines:

##r           21fd=ff   # (enable timing tests too!)
##runtest     300       # (TIMING too test duration)
runtest     1         # (NON-timing test duration)

Should be replaced with the following lines instead:

r           21fd=ff   # (enable timing tests too!)
runtest     300       # (TIMING too test duration)
##runtest     1         # (NON-timing test duration)

That will trigger the speed test logic and issue a message to the Hercules console with the calculated speed. On my own system I'm getting:

Current Hercules:

18:20:41 HHC02333I Script 1: test: running...
18:20:54 /         1,000,000 iterations of CLCLE took  12,227,569 microseconds
18:20:54 HHC00809I Processor CP00: disabled wait state 000A0000 00000000
18:20:54 HHC02334I Script 1: test: test ended
18:20:54 HHC02338I Script 1: test: actual duration: 12.312980 seconds

With your improved CLCLE logic:

18:21:06 HHC02333I Script 1: test: running...
18:21:06 /         1,000,000 iterations of CLCLE took     102,824 microseconds
18:21:06 HHC00809I Processor CP00: disabled wait state 000A0000 00000000
18:21:06 HHC02334I Script 1: test: test ended
18:21:06 HHC02338I Script 1: test: actual duration: 0.179795 seconds

Which I'm having a hard time believing!

Which is why I want to look at your code a bit closer.

In the mean time, you can make the other suggested fixes: fix your typo in the other vcxproj filters files, and fix your 03-performance .tst script.

Thanks! You're continuing to impress the hell out of me, James!!   :))

JamesWekel commented 2 years ago

Fish,

  1. Thanks for the reminder that I needed to fix the other Hercules_VS20??.vcxproj.filters files. Tunnel vision in correcting the error that I saw when checking the solution in VS 2022. Minor corrections have been applied to the other filter files.

  2. The CLCLE-03-performance.tst does additional tests in addition to the 'optional' performance test. It is a copy of 'CLCL-et-al.tst' altered to just test the CLCLE instruction. I wanted to highlight that this test could do the optional performance testing, thus the name change. The CLCL-el-al.tst default is 'off' for the performance tests for CLC, CLCL, MVCIN and TRT instructions.

I didn't want to want to turn on the performance test if you were running all the tests for a release check. I could alter the test to always do the performance test. Your call?

  1. All the credit goes to the author of the mem_cmp and mem_pad_cmp functions and the CLCL instruction. I'm still learning and just doing copy, paste, edit, and test to try to match the PoP instruction description. It helps that there was a working copy of CLCLE from which to start. So, I'm just doing simple stuff to learn the Hercules structure. So many macros and page boundary conditions to consider. One step at a time.

Thank you for the quick review.

Jim

Fish-Git commented 2 years ago
  1. The CLCLE-03-performance.tst does additional tests in addition to the 'optional' performance test. It is a copy of 'CLCL-et-al.tst' altered to just test the CLCLE instruction. I wanted to highlight that this test could do the optional performance testing, thus the name change. The CLCL-el-al.tst default is 'off' for the performance tests for CLC, CLCL, MVCIN and TRT instructions.

I didn't want to want to turn on the performance test if you were running all the tests for a release check. I could alter the test to always do the performance test. Your call?

Hmmm... okay. I see your point now. CLCLE-03 is indeed verifying correct CLCLE functionality above and beyond the tests you're doing in CLCLE-01 and CLCLE-02. So you're right: it is indeed still needed. Leave CLCLE-03-performance.tst as you currently have it. My bad.

  1. All the credit goes to the author of the mem_cmp and mem_pad_cmp functions and the CLCL instruction. I'm still learning and just doing copy, paste, edit, and test to try to match the PoP instruction description. It helps that there was a working copy of CLCLE from which to start. So, I'm just doing simple stuff to learn the Hercules structure. So many macros and page boundary conditions to consider. One step at a time.

That would be me.   :)

I wrote the mem_cmp and mem_pad_cmp functions. But I can't take all the credit, as they're basically doing the same thing as what the e.g. CLC instruction was already doing. All I did was make that same logic into a callable function that other "Compare Logical" type instructions could then use.

Thank you for the quick review.

You're welcome. I'm currently bogged down looking into something else right now, so I haven't had a chance yet to review your code more closely like I promised. But I will, I promise you.

Given how absolutely shitty the original CLCLE implementation was (comparing ONE BYTE AT A TIME?! Are you freaking kidding me?!), I guess I shouldn't be too surprised by the performance gain. But still, the gain is way above and beyond what I would have expected.

So your code might be absolutely fine! Or ... It could be missing something. I don't know yet. I still need to look at it a bit more closely. But I just wanted to let you know I'm kind of busy working on something else right now, so a closer review will have to wait a bit. Your patience is appreciated.

Thanks.

(In case I forget about it, poke me if I don't respond with my final review within a "reasonable" amount of time)

JamesWekel commented 2 years ago

Fish,

We are all get bogged down at times, so review the code whenever you have time. Since you 'wrote the mem_cmp and mem_pad_cmp functions', you are the best person to do the review.

The next mini-learning project is to review the TRE (Translate Extended) instruction.

Thanks again, Jim

JamesWekel commented 2 years ago

Fish,

I've been considering your comments on the CLCLE-03-performance.tst. I've decided to split this test into two tests, one as a basic function test and another for performance testing.

As performance test cases may run multiple tests, the default for a performance test will be off, i.e. a nop test. A manual change would be required to enable a performance test.

As I have a few changes to make, I'm closing this pull request. I'll open a new request in a few days.

Thanks Jim

Fish-Git commented 2 years ago

As I have a few changes to make, I'm closing this pull request. I'll open a new request in a few days.

10-4. Thanks, James. (or do you prefer to be called Jim?)