SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
237 stars 89 forks source link

CU14 - Performance Improvement #623

Closed JamesWekel closed 6 months ago

JamesWekel commented 6 months ago

Fish,

Here is a proposed performance improvement for the CU14 instruction.

Before the change, the performance test reported:

        1,000,000 iterations of CU14  took     846,440 microseconds
        1,000,000 iterations of CU14  took     879,949 microseconds
        1,000,000 iterations of CU14  took     855,963 microseconds
        1,000,000 iterations of CU14  took     855,836 microseconds
        1,000,000 iterations of CU14  took  73,516,592 microseconds

and after the change:

        1,000,000 iterations of CU14  took     172,939 microseconds
        1,000,000 iterations of CU14  took     200,985 microseconds
        1,000,000 iterations of CU14  took     204,565 microseconds
        1,000,000 iterations of CU14  took     223,280 microseconds
        1,000,000 iterations of CU14  took  10,825,924 microseconds

The performance increase ranges from 70 - 83%.

It has been a year plus since my last pull request, so I'm sure that I have missed something! I appreciate your review.

Once CU14 performance improvement is closed, I'll work on CU12 to finish Issue #101.

Jim

JamesWekel commented 6 months ago

Fish,

Change made. I would prefer not to rebase my CU14 branch to pick up your two minor commits. I always mess up my git rebase attempts.

Thanks for the quick review.

Jim

Fish-Git commented 6 months ago

I would prefer not to rebase my CU14 branch to pick up your two minor commits. I always mess up my git rebase attempts.

That's fine. As I said, no biggie.

Fish-Git commented 6 months ago

p.s. is there a reason your two tests are using mainstor 16? Isn't that overkill? Looking at your tests, it seems to me that 4MB would probably be plenty!

Again, no big deal.

Fish-Git commented 6 months ago

SEMI-RELATED QUESTION:

What kind of processor does your system have?

My system's processors are 2.93GHz X5570 Intel Xeons (it's an older system), and the speeds I'm getting are:

Before:

    1,000,000 iterations of CU14  took   2,588,625 microseconds
    1,000,000 iterations of CU14  took   2,352,916 microseconds
    1,000,000 iterations of CU14  took   2,352,945 microseconds
    1,000,000 iterations of CU14  took   2,413,947 microseconds
    1,000,000 iterations of CU14  took 195,244,598 microseconds

After:

    1,000,000 iterations of CU14  took     439,171 microseconds
    1,000,000 iterations of CU14  took     511,245 microseconds
    1,000,000 iterations of CU14  took     515,241 microseconds
    1,000,000 iterations of CU14  took     624,051 microseconds
    1,000,000 iterations of CU14  took  32,890,569 microseconds

Which is indeed an 83% performance improvement (or about 6.25 times faster! WELL DONE, James!), but notice how much slower my times are compared to yours! Your system is a good 3 times faster than mine!!

So I'm curious: What kind processors do you have and how fast are they running at??

JamesWekel commented 6 months ago

Fish,

I do appreciate all the review, including the comments. I usually have spelling errors in comments (as they are not tested!) and I don't read them after a while.

I reduced the mainstor size to 4 for CU14-01-xpage and to 8 for CU14-02-performance (currently has a reference above the 6M boundary). I could reduce CU14-02 memory size but that would require a code change, and I'm a bit lazy.

My performance numbers were from a 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz.

Jim