SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
237 stars 89 forks source link

CU12 Performance Improvement #625

Closed JamesWekel closed 6 months ago

JamesWekel commented 6 months ago

Fish,

Here is a proposed performance improvement for the CU12 instruction.

Before the change, the performance test reported:

         1,000,000 iterations of CU12  took     925,907 microseconds
         1,000,000 iterations of CU12  took     899,111 microseconds
         1,000,000 iterations of CU12  took     930,536 microseconds
         1,000,000 iterations of CU12  took     909,909 microseconds
         1,000,000 iterations of CU12  took 121,023,956 microseconds

and after the change:

        1,000,000 iterations of CU12  took     173,238 microseconds
        1,000,000 iterations of CU12  took     199,005 microseconds
        1,000,000 iterations of CU12  took     205,140 microseconds
        1,000,000 iterations of CU12  took     242,777 microseconds
        1,000,000 iterations of CU12  took  19,210,274 microseconds

Over 4 performance tests, the performance increase ranges from 74 - 85%.

Note: The change uses CSWAP16 and CSWAP32. I hope that I'm using them correctly thus your review.

Maybe this closes Issue https://github.com/SDL-Hercules-390/hyperion/issues/101?

Thanks, Jim

JamesWekel commented 6 months ago

Fish, Sorry I has the wrong base branch! Jim

Fish-Git commented 6 months ago

Sorry I has the wrong base branch!

No big deal. You're only missing one small change I committed only just yesterday, and it's not included in your Pull Request of course (only your CU12 changes are), so it doesn't impact you Pull Request at all.

As you can see, I've already very quickly approved your PR this time since it's virtually identical to the technique you used for CU14. Well done! (Again!) You are a true Herculean, James! THANK YOU!