SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

CLCLE instruction performance #499

Closed JamesWekel closed 2 years ago

JamesWekel commented 2 years ago

Fish,

"(or do you prefer to be called Jim?)" Generally Jim, but usually anything written is James.

Here is the updated CLCLE instruction performance pull request with the performance test split into a basic function test and a no-op perfromance test unless manually enabled. I've included the following comment in the CLCLE-04-perfomance.tst for emphasis.

.# ---------------------------------------------------------------------------------- .# This ONLY tests the performance of the CLCLE instruction. .# .# The default is to NOT run performance tests. To enable .# performance test, uncomment the "#r 21fd=ff " .# line below. .# .# Tests: .# 1. CLCLE of 512 bytes .# 2. CLCLE of 512 bytes where operand 1 crosses a page boundary .# 3. CLCLE of 2048 bytes .# 4. CLCLE of 2048 bytes where operand 1 crosses a page boundary .# 5. CLCLE of 2048 bytes where both operand 1 and operand 2 .# crosses a page boundary
.# .# Output: .# For each test, a console line will the generated with timing results, .# as follows: .# / 1,000,000 iterations of CLCLE took 38,698 microseconds
.# / 1,000,000 iterations of CLCLE took 48,617 microseconds
.# / 1,000,000 iterations of CLCLE took 49,178 microseconds
.# / 1,000,000 iterations of CLCLE took 68,355 microseconds
.# / 1,000,000 iterations of CLCLE took 69,991 microseconds .# ----------------------------------------------------------------------------------

The CLCLE-04performance.tst does 5 tests. On my system, the performance results before the change were:
/ 1,000,000 iterations of CLCLE took 9,052,879 microseconds / 1,000,000 iterations of CLCLE took 9,096,818 microseconds / 1,000,000 iterations of CLCLE took 35,928,188 microseconds / 1,000,000 iterations of CLCLE took 35,424,265 microseconds / 1,000,000 iterations of CLCLE took 36,205,255 microseconds

and after the change: / 1,000,000 iterations of CLCLE took 41,159 microseconds / 1,000,000 iterations of CLCLE took 49,770 microseconds / 1,000,000 iterations of CLCLE took 51,996 microseconds / 1,000,000 iterations of CLCLE took 69,810 microseconds / 1,000,000 iterations of CLCLE took 74,077 microseconds

There are 5 tests as I was curious on the impact of operands crossing page boundaries with the revised instruction. But a 99% reduction from the original timing still works for me. And, the reduction is all from your mem_cmp function!

Again, no rush in the review.

Jim

JamesWekel commented 2 years ago

Sorry, cancelled as wrong branch.

Jim