Fish-Git commented 6 years ago

A quick review of a some instructions reveals they are very poorly coded, performance-wise. Most all of them have not been modified since the repository was initially setup over 17 years ago and use vfetchb and vstoreb to process their operands one byte at a time (which is extremely inefficient!):

Note: Instructions which are checked have been recently modified and now no longer need to be improved. The ones that are unchecked are those whose performance still needs to be improved.

[x] B241 CKSM Checksum (completed by Bob Polmanter)
[x] A9 CLCLE Compare Logical Long Extended (contributed by James Wekel)
[x] B25D CLST Compare Logical String (completed by Bob Polmanter)
[x] B257 CUSE Compare Until Substring Equal (contributed by James Wekel)
[x] B2A7 CU12 Convert UTF-8 to Unicode (contributed by James Wekel)
[x] B9B0 CU14 Convert UTF-8 to UTF-32 (contributed by James Wekel)
[x] B255 MVST Move String (completed by Bob Polmanter)
[x] B25E SRST Search String (completed by Bob Polmanter)
[x] B2A5 TRE Translate Extended (contributed by James Wekel)
[x] B9BF TRTE Translate and Test Extended (contributed by James Wekel)
[x] D0 TRTR Translate and Test Reverse (contributed by James Wekel)
[x] B9BD TRTRE Translate and Test Reverse Extended (contributed by James Wekel)

I believe that by using the same or similar techniques as were used to improve the performance of the CLCL, MVCIN and TRT instructions (see issue #99: Poor performance of CLCL, MVCIN and TRT instructions), the performance of each of them could be significantly improved (very likely by at least one, and possibly two, orders of magnitude!).

I would like to invite any/all Hercules developers to please try their hand at doing so.

I could use some help here!

This is a low priority item, but really should eventually be looked into. While I have not performed any measurements in this area, I rather suspect some of these instructions(*) might well be used quite extensively.

Bottom line: there's lots of room for improvement here!

(*) The MVST (Move String) SRST (Search String) instructions appears to have been specifically written to handle the C/C++ strcpy and strchr library functions for example, and TRTE is a perfect candidate for the C/C++ memchr library function, etc.

Fish-Git commented 6 years ago

Since @ivan-w has volunteered in Issue #108 to look into the TRTR, TRTE and TRTRE (and hopefully TRE too!) instructions, I am marking this issue as "In Progress...".

Apparently(*) however, I cannot assign this issue to him until he first posts a comment to this GitHub Issue. Once he does then I'll do that.

(*) The only choices GitHub appears to offer for assignment are individuals participating in a given issue's conversation (i.e. only those that choose to reply). Go figure.

Fish-Git commented 6 years ago

Since @ivan-w has volunteered in Issue #108 to look into the TRTR, TRTE and TRTRE (and hopefully TRE too!) instructions, I am marking this issue as "In Progress...".

Apparently(*) however, I cannot assign this issue to him until he first posts a comment to this GitHub Issue. Once he does then I'll do that.

(*) The only choices GitHub appears to offer for assignment are individuals participating in a given issue's conversation (i.e. only those that choose to reply). Go figure.

The above is no longer true now that @ivan-w is now a member of our organization, so I've gone ahead and assigned this issue to Ivan.

Ivan? If this is unacceptable, then please feel free to remove yourself as the assignee!

wably commented 6 years ago

@Fish-Git I think I will sign up to do the three string instructions on the list: CLST, MVST, and SRST. Unless you know that someone else is working on them?

All three of these could benefit from using techniques similar to those already used for some other instructions mentioned, making allowances for access exceptions in the right order and considerations for page boundaries.

Fish-Git commented 6 years ago

I think I will sign up to do the three string instructions on the list: CLST, MVST, and SRST. Unless you know that someone else is working on them?

Go for it! PLEASE! :)

Ivan volunteered (or implied he volunteered) in another issue (#108), but I have no idea whether or not he is actually working on it. You should probably coordinate your effort with him off-list. He may just be busy with work or real life. (whatever the heck those things are!)

Thanks!

wably commented 5 years ago

Since nothing seemed to be happening here I went ahead and completed improvements to the three string instructions, MVST, SRST, and CLST. All three had their execution times reduced by over 80%. Below summarizes the results:

inst	before	after	red%
MVST	9.62	1.21	87%
SRST	5.40	1.01	81%
CLST	10.05	1.67	83%

All three timings were moves, searches, or compares across three page frames, with 100,000 iterations each. All times are in seconds. The host computer was an Intel i7. SRST 'before' results are approximately one-half of the others because only one operand has to be incremented. The execution times were improved by implementing the page boundary dance for each operand thereby avoiding a vfetch( ) or vstore( ) for every byte.

I will commit this code in a day or two. I just want to double check everything and ensure that I have all of the test cases covering all condition codes and operand combinations (crossing pages, not crossing, etc).

Fish-Git commented 5 years ago

Since nothing seemed to be happening here I went ahead and completed improvements to the three string instructions, MVST, SRST, and CLST. All three had their execution times reduced by over 80%.

Wow! Fantastic work, Bob! Thank you!

I will commit this code in a day or two. I just want to double check everything and ensure that I have all of the test cases covering all condition codes and operand combinations (crossing pages, not crossing, etc).

I look forward to seeing your commit! :)

wably commented 5 years ago

String instructions committed by 8194b34

@Fish-Git if you are still seeing problems with tabs in the source, please inform.

Fish-Git commented 5 years ago

String instructions committed by 8194b34

Thank you, Bob! Have you created a runtest test case for these instructions yet?

if you are still seeing problems with tabs in the source, please inform.

I checked and I didn't see any.

wably commented 5 years ago

Fish,

My test cases are simple assembler programs that run under VM (they could also run under MVS with some JCL wrapped around them). There are three programs, one for each instruction improved. Each program does a number of tests, basically it sets up a situation and then executes the instruction. The results of the execution are displayed and it is compared to the known expectation (basically the results of these exact same tests before my modification). Then on to the next situation and test for the same instruction. Each possible condition code and combinations of page crossing operands (or not) were exercised and checked against known results.

So after all of that and to answer your question, no I do not have a ‘runtest’ test case; I don't even know what that is. I can make my programs available of course, if that is acceptable.

srorso commented 5 years ago

Hi Bob:

If you would post / PM the programs to me, I would be happy to create runtest cases.

Steve

wably commented 5 years ago

Steve,

Thanks for the offer, but you shouldn't have to do what I should have done in the first place.

After my response to Fish last night, I remembered that there was a 'tests' directory in the repo, and when I looked there I saw the kinds of tests that he was referring to. So, I can go ahead and create tests similar to those as appropriate for the string instructions.

It will definitely be more of a pain to put these together than the ALC programs I have now, but I recognize some reasons for having the tests be more or less standardized and not reliant on IPLing an OS prior to running a test. The learning curve to put these together effectively is the painful part; I can whip up stuff in ALC much faster and then move on.

Bob

srorso commented 5 years ago

Hi Bob,

I understand fully, but the offer stands. Teamwork and all that.

Harold Grovesteen's SATK package is a worthy tool, and I am happy to field questions, particularly about the BFP examples.

Steve

wably commented 5 years ago

This is definitely a case of "no good deed goes unpunished".

Fish-Git commented 5 years ago

This is definitely a case of "no good deed goes unpunished".

(LOL!) I just asked a simple question! :)

Fish-Git commented 5 years ago

@wably : I just committed some minor tweaks *`()`** to your code, mostly pertaining to rewording of comments to make them shorter and fit within 72-80 characters, but also some very minor coding tweaks too (but only a few!) in a few places too.

I hope you don't mind. :/

*`()`** I Just can't help myself sometimes! I'm convinced it's a form of OCD that just compels me to do sh^htuff like this. It's hard to control. :(

wably commented 5 years ago

Fish, I reviewed your changes and I am fine with them. Now that I can see how you want things to look I can make a better effort to comply with it. Actually, if you published a 'coding standards' document for that sort of thing, then it would be a lot easier to follow the guideline - if the look of the code is important to you.

Fish-Git commented 5 years ago

Actually, if you published a 'coding standards' document for that sort of thing, then it would be a lot easier to follow the guideline - if the look of the code is important to you.

The only real coding standard that exists doesn't really officially exist anywhere (i.e. it isn't documented anywhere). It's mostly just two things:

No tabs. Use spaces instead.
Braces should be on separate lines.

Everything else is just what I would call common sense programming:

Try to make your code simple and clear.
Use comments where needed. (what's obvious to you might not be to others!) *`()`**
Refrain from using explicit values. Use #define constants instead.
Don't repeat yourself. Create a callable function instead.
Don't abuse macros. Use inline functions instead. `()`**
Don't be afraid to use blank lines to break up logical sequences of code. **(***)**
Etc...

There's probably some more I could probably add but that's good enough for now.

_*`()` This is especially important when there is anything that is even remotely "subtle". If something has to be done in a certain way because doing it differently would break something somewhere else, then that is definitely** something that should be well commented!_

`()`** There's far too much of this in Hercules IMHO! Using macros that expands into a bunch of code not only makes it harder to maintain the code, but also harder to debug. Source-level debuggers can't step through macros nor allow you to set a breakpoint on a specific line of the macro, nor examine its variables, etc. Plus, macros that generate code aren't instruction cache friendly either. If you find yourself wanting to code a macro because you need to generate the same code in many different places, code an inline function instead! It's more host-processor instruction-cache friendly and is easier to debug, allowing you to set breakpoints and examine variables and step through the code, etc.

**(***)** Some might not appreciate that. Some programmers feel blank lines makes the code take up too many lines vertically on your screen, making it more difficult to see as much of the code as possible. Me, I'm the opposite: I feel it's easier for the human being to absorb/understand the code when it's in smaller easier to consume bite-size chunks than in larger chunks that your mind has to then break apart. Besides, if you use your monitor in portrait mode instead of landscape mode (which IMO all programmers should do as it allows them to see more code!), then a few extra blank lines here or there is no big deal.

Fish-Git commented 5 years ago

Here are the rules I try to live by. I don't always adhere to many of them but I usually try to for most of them. (I think)

(Ref: https://www.goodreads.com/book/show/598624.Writing_Solid_Code)

Writing Solid Code

by Steve Maguire

Always ask, "Can this variable or expression over- or underflow?"
Always look for, and eliminate, flaws in your interfaces.
As you step through code, focus on data flow.
Create thorough subsystem checks, and use them often.
Define explicit function arguments. Eliminate ambiguity.
Design your tests carefully. Nothing should be arbitrary.
Document unclear assertions.
Don't allow unnecessary flexibility.
Don't clean up code unless the cleanup is critical to the product's success.
Don't fix bugs later; fix them now.
Don't hide bugs when you program defensively.
Don't implement nonstrategic features. There is no free lunch.
Don't keep 'trying' solutions until you find one that works. Take the time to find the correct solution.
Don't reference memory that you don't own.
Don't wait until you have a bug to step through your code.
Don't write multipurpose functions. Write separate functions to allow stronger argument validation.
Either remove implicit assumptions, or assert that they are valid.
Eliminate random behavior. Force bugs to be reproducible.
Enable all compiler warnings.
Fix the cause, not the symptom.
Handle special cases just once.
If something happens rarely, force it to happen often.
Implement your designs as accurately as possible.
Maintain both ship and debug versions of your code.
Make code intelligible at the point of call. Avoid Boolean arguments.
Make it hard to ignore error conditions. Don't bury error codes in return values.
Never allow the same bug to bite you twice.
Shred your garbage.
Step through every code path.
Strip undefined behavior from your code.
Strive to implement transparent integrity checks.
Throw away your bag of tricks. Be truly clever: Write boring code.
Tight C code does not guarantee efficient machine code.
Use a second algorithm to validate your results.
Use assertions to detect impossible conditions.
Use assertions to validate all function arguments.
Use lint to catch bugs your compiler may miss.
Write and test code in small chunks. Always test your code, even if that means your schedule will slip.
Write comments that emphasize potential hazards.
Write functions that, given valid inputs, cannot fail.

wably commented 5 years ago

Runtest cases for the three improved string instructions added by commit 44e4d07

Many thanks to Steve Orso, who provided assistance, examples, tips and advice in getting started with runtest cases!

Fish-Git commented 5 years ago

Runtest cases for the three improved string instructions added by commit 44e4d07

Many thanks to Steve Orso, who provided assistance, examples, tips and advice in getting started with runtest cases!

I appreciate the effort, guys! THANKS! :)

As new instructions are added (or existing ones enhanced) I feel it's important to add a runtest test case to our existing suite that verifies the instruction is [still] working properly. So I really appreciate the extra effort, guys. Thanks!

And just as an FYI, the proper way to "restore" an ostailor setting is to use ostailor default, not null.

That is to say, ~ostailor null is an ostailor setting unto itself that completely disables the displaying of any program interrupts (suppresses them all)~, but it is not the default setting when Hercules first comes up. That's what ostailor default is for: it's sets the default ostailor value.

It's no big deal. I've already fixed it for you. It's just an FYI.

Fish-Git commented 5 years ago

ostailor null is an ostailor setting unto itself that completely disables the displaying of any program interrupts (suppresses them all)

(Ack!) WRONG!

"ostailor quiet" suppresses them all!

"ostailor null" displays them all!

HHC01603I help ostailor
HHC01603I
HHC01602I Command               Description
HHC01602I ----------------      -------------------------------------------------------
HHC01602I ostailor             *Tailor trace information for specific OS
HHC01603I
HHC01603I Format: "ostailor [quiet|os/390|z/os|vm|vse|z/vse|linux|opensolaris|null]".
HHC01603I Specifies the intended operating system. The effect is to reduce
HHC01603I control panel message traffic by selectively suppressing program
HHC01603I check trace messages which are considered normal in the specified
HHC01603I environment. The option 'quiet' suppresses all exception messages,
HHC01603I whereas 'null' suppresses none of them. The other options suppress
HHC01603I some messages and not others depending on the specified o/s. Prefix
HHC01603I values with '+' to combine them with existing values or '-' to exclude
HHC01603I them. SEE ALSO the 'pgmtrace' command which allows you to further fine
HHC01603I tune the tracing of program interrupt exceptions.
HHC01603I
HHC01603I ostailor quiet
HHC01603I pgmtrace
HHC02281I pgmtrace == none
HHC01603I ostailor null
HHC01603I pgmtrace
HHC02281I pgmtrace == all
HHC01603I ostailor default
HHC01603I pgmtrace
HHC02281I * = Tracing suppressed; otherwise tracing enabled
HHC02281I 0000000000000001111111111111111222222222222222233333333333333334
HHC02281I 123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0
HHC02281I                **    *     *                                   *

Sorry for the misinformation! :(

wably commented 5 years ago

@Fish-Git or @ivan-w Do you know if the Translate instructions in the list at the beginning of this issue can be checked off as completed, by the changes made in issue #107 and #108?

ivan-w commented 5 years ago

The Translate and Test (and various variations) are complicated to test for compliance, but it's certainly feasible. The specific issue with Translate and Test instructions is that any access exception only occurs 'as needed' - so if the 2nd operand crosses a page boundary (and the 2nd page isn't accessible) but the data in the 1st operand doesn't mandate access the second page of the second operand, then everything should work without any exception. A complete unit test to test all conditions is tricky!

wably commented 5 years ago

@ivan-w Right, that I understand. But back to the question... do those changes you made also address the performance issue or is that still to be tackled?

ivan-w commented 5 years ago

The performance issues have been dealt with. TRT only required a few adjustment to deal with boundary issues just to be architecture compliant! The original version went through translation for every step... The optimized version did translation as a block transaction - but would issue an unnecessary exception in some borderline case. The current version has the optimized version and deals with the borderline case. The current issue is the test case which should test all the conditions for possible exception (or lack thereof).

wably commented 5 years ago

Improved performance for the CKSM instruction as well as runtest cases added by commit 9f27856

The execution time for CKSM was reduced by 82%, from 10.98 seconds to 1.89 seconds (500,000 iterations of CKSM on a 14,000 byte buffer of mixed data).

Fish-Git commented 5 years ago

Thank you, Bob! Well done! :)

wgs777 commented 2 years ago

Thank you all for this observation. I've been running some benchmarks on the IBM MVS 3.8J with latest SDL-Hercules/Hyperion source code from GitHub as of 2022-05-24 compiled with gcc version 11.3.1-2 with default ./configure optimization options of gcc -O3 -march=native, etc on an Intel i7 4900 MQ.

The COBOL program shown below takes about 50 seconds to run on Hercules. This is compared with 6 seconds when run on a DEC PDP-10 36 bit Tops-10 using the klh10 emulator and DEC's COBOL compiler of the same exact COBOL benchmark program.

I know this "GIT Hercules instruction slowness issue" is listed with low priority, but would really be a huge improvement to Hercules if the source code of it's instructions could be optimized in a safe way with an abundance of unit tests to ensure existing functionality isn't broken.

The MVS 3.8J COBOL benchmark program I've been using is shown below. Since the variable Y is being displayed, it forces the compiler to do all the intermediate calculations. The use of the PMAP option shows the assembler object code generated.

benchcob.txt

Fish-Git commented 2 years ago

The COBOL program shown below takes about 50 seconds to run on Hercules. This is compared with 6 seconds when run on a DEC PDP-10 36 bit Tops-10 using the klh10 emulator and DEC's COBOL compiler of the same exact COBOL benchmark program.

Immaterial. IBM mainframes use a completely different machine architecture than the DEC PDP-10. They are not the same (nor even remotely similar). The two machine architectures are completely different from one another, and as a result, so are the hardware emulators themselves too. You are comparing apples to oranges.

The MVS 3.8J COBOL benchmark program I've been using is shown below. Since the variable Y is being displayed, it forces the compiler to do all the intermediate calculations. The use of the PMAP option shows the assembler object code generated.

benchcob.txt

(emphasis added)

You are presuming that we too are also familiar with, and run, the IBM MVS 3.8J operating system on Hercules. Such a presumption is erroneous. I know nothing about IBM MVS 3.8J, so your providing the COBOL source code is personally unhelpful to me.

What would be more helpful would be seeing the actual object code (i.e. the actual hardware instructions) that the IBM MVS 3.8J COBOL compiler generates (i.e. this presumed "PMAP" output you mention) so that we can review each of the instructions involved to see whether there is maybe anything we can do to improve their respective performance on Hercules.

Thanks.

wgs777 commented 2 years ago

Thank you for your time on this. I've uploaded the file below which contains the Cobol + Assembler code + additional info.

assembler_and_cobol_code.txt

Fish-Git commented 2 years ago

I've uploaded the file below which contains the Cobol + Assembler code + additional info.

assembler_and_cobol_code.txt

Thank you. I will take a look at it.

Couple of things however:

Your COBOL program's comments state: "... doing millions of integer and floating point additions and divisions", but that is not what I am seeing in the generated object code. I am seeing mostly decimal instructions (ZAP/PACK/UNPK/AP/SP). I see zero integer or floating point instructions and no divisions at all.
I also see a lot of function calls as well (e.g. LM 2,3,1E8(13), L 15,00C(0,12), BALR 14,15), and can't help but wonder what they are doing and whether they may actually be where your slowness is (in addition to the already mentioned use of decimal instructions, which any decent programmer should know are slow as hell).
It appears you may have created your file via a copy & paste from a 80-column wide terminal screen, as many of the list instructions in the COMPUTE section you are most concerned about appear to be truncated at column 80 (e.g. ZAP 1C0(16,13),060(10,). While technically not absolutely necessary (since we can see the entire machine code instruction, e.g. F8 F9 D 1C0 D 060 after all), it would have perhaps been better to have provided the actual printer listing instead.

That's all for now. Let me review our decimal instructions (PACK/UNPK/ZAP/AP/SP, etc) to see whether there is maybe anything that can be done to improve their performance any.

In the mean time maybe you can research what those function/subroutine calls are doing since they are being called a million times too. If they are each executing many, many instructions, they may actually be where your slowness lies, not the decimal instructions.

Does the version of COBOL compiler you are using have any type of optimization option? To be honest, the generated object code I'm seeing doesn't look very efficient!

Thanks.

wrljet commented 2 years ago

Fish,

I forget the exact date, but the legal COBOL compiler most people use with MVS 3.8, dates from around 1970 or something. Came from the OS/360 MVT era and has been patched to run with modern dates, if I remember correctly. (I'm not a COBOL guy)

Bill

Fish-Git commented 2 years ago

Fish,

I forget the exact date, but ...

And this means ..................... what? Are you trying to explain why it doesn't (or likely doesn't) have any type of optimization option? It's unclear to me what information your comment is trying to convey!

wgs777 commented 2 years ago

I've attached the complete untruncated listing here:

With the untruncated listing, we can see it is calling on this function: ILBOBID2:

L     15,00C(0,12)          V(ILBOBID2)

Searching for ILBOBID2 in manual GC28-6399-02 "COBOL Compiler and Library, Version 2, Programmer's Guide", we can see on page 278 that it's a COBOL Library Conversion subroutine for converting from Binary to Internal Decimal.

So you are correct. The compiler appears to be generating inefficient assembler code using decimal numbers and inefficient library routines. I checked the manual and this compiler doesn't have an optimize option. Unfortunately we are stuck with this compiler as it's the only legal IBM COBOL Compiler available for TK4-.

This compiler was the direct predecessor to OS/VS COBOL and COBOL II after that. It's unknown if IBM ever improved it's compiler code generation, so later compilers might generate similar code.

I'm planning to do a large amount of COBOL development using TK4- and this compiler. It would be beneficial to check the Hercules implementations of the assembler instructions that appear most often in the attached compiler listing (like you suggested): (ZAP/PACK/UNPK/AP/SP) to look for any opportunities to improve/optimize the Hercules code.

IBM hardware historically has been used primarily for business programs, and most business programs are written in COBOL. Optimizing Hercules' instruction implementations for assembler instructions that are commonly generated by COBOL compilers could really be a worthwhile improvement to the Hercules emulator.

wrljet commented 2 years ago

Fish, I forget the exact date, but ...

And this means ..................... what? Are you trying to explain why it doesn't (or likely doesn't) have any type of optimization option? It's unclear to me what information your comment is trying to convey!

I'm suggesting it may not have the best optimization.

Fish-Git commented 2 years ago

So you are correct. The compiler appears to be generating inefficient assembler code using decimal numbers and inefficient library routines. I checked the manual and this compiler doesn't have an optimize option. Unfortunately we are stuck with this compiler as it's the only legal IBM COBOL Compiler available for TK4-.

It's been far too many years since I've messed with COBOL, but have you investigated making changes to your program itself to be more efficient? For example, you have variables A, B and C defined as "COMP" (i.e. Binary?), but variables X and Y are both defaulting to non-binary (i.e. Decimal?). Since your COMPUTE statements in your A10-LOOP subroutine are performing calculations based on both variable types, it's no wonder there is a lot of converting from Binary to Decimal going on!

Try declaring variables X and Y as "COMP" too to see whether that helps improve performance any. I suspect it might!

Fish-Git commented 2 years ago

Or... Since your first COMPUTE statement is doing a divide (which might explain your 9(9)V9(9) "PIC" for variables X and Y), maybe declare ALL of them as "PIC 9(9)V9(9) COMP-2" (Floating Point?) instead? Then maybe there might not be any conversion going on at all?

My point is, try tweaking your COBOL program itself. It too seems to be inefficiently written IMHO. <shrug>

Fish-Git commented 2 years ago

My point is, try tweaking your COBOL program itself. It too seems to be inefficiently written IMHO.

That having been said, I'm not saying that Hercules's Decimal instructions implementation is as efficient as it can be. There may well be room for much improvement, I don't know. I have no investigated that angle yet.

All I'm saying is, asking the emulator to improve the inherent inefficiently of your COBOL programs is the wrong way to approach the problem. Instead, you should investigate how to write more efficient COBOL programs. It is not the emulator's responsibility to make up for poorly (inefficiently) written COBOL programs. Rather, it is the programmer's responsibility to write more efficient COBOL programs.

At least that's the way I'm seeing things right now.

As I said, I'm not saying there couldn't be some improvements that we could make to Hercules to improve the performance of our Decimal instructions. That may well be true. I don't know.

But why should I/we go to that effort when there are obvious inherent inefficiencies in your program itself? As I said, it's not the responsibility of the emulator to make up for poorly written inefficient programs.

wgs777 commented 2 years ago

I chose not to use COMP-2 for portability, as it's not supported by some non IBM compilers. In MVS 3.8J a few minutes ago, I tried defining COMP-2 for X,Y, and ran the program and got all zeros in the "Y" result, which is a different result than expected.

In referencing the IBM Cobol manual on page: 189, it details the rules for conversion of data types. It appears to always do significant conversions no matter what data types are being used. The manual also mentions that for longer data types over 15 digits like PIC 9(9)V9(9) it needs additional instructions and subroutine calls. The performance hit for these conversions seems to be more severe than with other non IBM Cobol compilers.

Since we're only concerned here with Hercules, I've decided to temporarily take the Cobol compiler out of the equation. Instead, I'll write a similar portable benchmark program in Fortran where Hercules should have the advantage since the IBM Fortran H compiler is famous for its powerful optimizations. I'll then run that program on the same emulators as before, and see if Hercules and the other emulators have similar timings.

Also, I'll do some additional testing on the IBM Cobol compiler (since I'll be needing to use it a lot on Hercules), using the most efficient data types to try to bring down the CPU usage.

If you do get time to take a very quick look at the coding for these instructions in Hercules: ZAP/PACK/UNPK/AP/SP, just to see if there are any huge optimizations to be had it would be interesting to know.

Thanks again.

wgs777 commented 2 years ago

I just wanted to provide an update. I ran a few Fortran Benchmarks with Hercules and also my PDP-10 emulators. Hercules was slightly faster than my other emulators with the Fortran benchmarks. So, the performance issues I observed with COBOL were most likely unrelated to Hercules.

In trying to tune the COBOL program with many strategic variable types, I've proven that this COBOL compiler generates 25+ subroutine calls even when defining for X,Y variables for [1] COMP SYNC, [2] COMP-3, AND [3] COMP-2.

The run times for [1] and [2] above were both 50 seconds, matching the original "inefficient" run with X,Y being USAGE Display. The [3] run using X,Y as COMP-2 runs faster, but with a zero result. For [3] run using X,Y as COMP-2, the compiler is still generating assembler using these 27 subroutine calls.

0003C6  58 F0 C 004                 L     15,004(0,12)
0003D6  58 F0 C 008                 L     15,008(0,12)
000412  58 F0 C 00C                 L     15,00C(0,12)
00042E  58 F0 C 010                 L     15,010(0,12)
000434  58 F0 C 014                 L     15,014(0,12)
00044E  58 F0 C 010                 L     15,010(0,12)
000468  58 F0 C 00C                 L     15,00C(0,12)
0004A2  58 F0 C 00C                 L     15,00C(0,12)
0004BE  58 F0 C 010                 L     15,010(0,12)
0004C4  58 F0 C 014                 L     15,014(0,12)
0004DE  58 F0 C 010                 L     15,010(0,12)
0004F8  58 F0 C 00C                 L     15,00C(0,12)
000542  58 F0 C 00C                 L     15,00C(0,12)
00055E  58 F0 C 010                 L     15,010(0,12)
000564  58 F0 C 014                 L     15,014(0,12)
00057E  58 F0 C 010                 L     15,010(0,12)
000598  58 F0 C 00C                 L     15,00C(0,12)
000608  58 F0 C 004                 L     15,004(0,12)
000618  58 F0 C 008                 L     15,008(0,12)
000646  58 F0 C 018                 L     15,018(0,12)
00065E  58 F0 C 008                 L     15,008(0,12)
000686  58 F0 C 00C                 L     15,00C(0,12)
00069C  58 F0 C 00C                 L     15,00C(0,12)
0006B8  58 F0 C 00C                 L     15,00C(0,12)
0006DC  58 F0 C 01C                 L     15,01C(0,12
0006EE  58 F0 C 020                 L     15,020(0,12)
00073A  58 F0 C 024                 L     15,024(0,12)

My thoughts here are that this compiler relies too much on subroutine calls. This slowness wouldn't be noticeable in a normal COBOL program, but becomes evident with a benchmark program.

Thanks all for responding. I really appreciated your time on this.

wgs777 commented 2 years ago

I saw some commits on the CLCLE and TRE instruction performance improvement.

Would anyone know if these 2 instruction's have performance improvements and if so, what % improvement?

Fish-Git commented 2 years ago

I saw some commits on the CLCLE and TRE instruction performance improvement.

Would anyone know if these 2 instruction's have performance improvements and if so, what % improvement?

You can determine the improvement in performance yourself by running either one of the performance tests (CLCLE-04-performance.tst and TRE-02-performance.tst) in the "tests" subdirectory yourself on a version of Hercules both before the commit and then again afterwards.

If you need help on how to run the tests, just ask. The most important thing you'll need to do to run them however, is to enable them beforehand (they're both disabled by default) by uncommenting the appropriate statements in each test script. (Refer the the test scripts themselves for more information.)

Tests performed by both James (the author of the commits) and myself have measured improvements in the 95%+ range. A truly dramatic increase! For details, simply refer to the Git "Pull" requests themselves:

wgs777 commented 2 years ago

Thank you. I will try out the tests before getting the updates and after.

I have one unrelated question - not sure if this the correct place to ask this question. Apologies if I am posting it in the wrong forum or place. Please let me know the proper forum to ask it.

Here is the question: I have hercules + gists fully built using gcc 12.2.1 and my hercules runs Tk4- MVS 3.8J fine with 2 emulated MVS CPU's. Everything seems to work perfectly. But I noticed in the startup logs, I get the following error:

HHC90020W 'hthread_setschedparam()' failed at loc=commadpt.c:1310: rc=22: Invalid argument

The error occurs multiple lines (see below). Would you know what the fix would be. I assume its a configuration issue? Look for the words: Invalid argument at the end of several of the lines.

HHC01004I 0:066A COMM: listening on port 37913 for incoming TCP connections HHC00100I Thread id 00007f2f749a7640, prio 0, name '3705 device(1:0669) thread' started HHC00100I Thread id 00007f2f747a5640, prio 0, name '3705 device(1:066A) thread' started HHC01004I 0:066B COMM: listening on port 37914 for incoming TCP connections HHC00100I Thread id 00007f2f745a3640, prio 0, name '3705 device(1:066B) thread' started HHC90020W 'hthread_setschedparam()' failed at loc=commadpt.c:1310: rc=22: Invalid argument HHC00100I Thread id 00007f2f74493640, prio 0, name '0:0670 communication thread' started HHC01004I 0:0670 COMM: listening on port 37801 for incoming TCP connections HHC90020W 'hthread_setschedparam()' failed at loc=commadpt.c:1310: rc=22: Invalid argument HHC00100I Thread id 00007f2f74392640, prio 0, name '0:0671 communication thread' started HHC01004I 0:0671 COMM: listening on port 37802 for incoming TCP connections HHC90020W 'hthread_setschedparam()' failed at loc=commadpt.c:1310: rc=22: Invalid argument HHC00100I Thread id 00007f2f74291640, prio 0, name '0:0672 communication thread' started

Thanks, Bill

On Mon, Sep 26, 2022 at 4:11 PM Fish-Git @.***> wrote:

I saw some commits on the CLCLE and TRE instruction performance improvement.

Would anyone know if these 2 instruction's have performance improvements and if so, what % improvement?

You can determine the improvement in performance yourself by running either one of the performance tests (CLCLE-04-performance.tst and TRE-02-performance.tst) in the "tests" subdirectory yourself on a version of Hercules both before the commit and then again afterwards.

If you need help on how to run the tests,, just ask. The most important thing you'll need to do to run them however, is to enable them beforehand (they're both disabled by default) by uncommenting the appropriate statements in each test script. (Refer the the test scripts themselves for more information.)

Tests performed by both James (the author of the commits) and myself have measured improvements in the 95%+ range. A truly dramatic increase! For details, simply refer to the Git "Pull" requests themselves:

CLCLE instruction performance https://github.com/SDL-Hercules-390/hyperion/pull/500

TRE instruction performance https://github.com/SDL-Hercules-390/hyperion/pull/498

— Reply to this email directly, view it on GitHub https://github.com/SDL-Hercules-390/hyperion/issues/101#issuecomment-1258562002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYRYNQR6KIRB4C2TJ2AB4P3WAH7OPANCNFSM4FFTBEEQ . You are receiving this because you commented.Message ID: @.***>

Fish-Git commented 2 years ago

Hi Bill!

I have one unrelated question - not sure if this the correct place to ask this question. Apologies if I am posting it in the wrong forum or place. Please let me know the proper forum to ask it.

This is not the correct forum/place. GitHub Issues are meant for reporting bugs or requesting enhancements or for Hercules developer discussions, etc. It is not meant for providing help with running Hercules (e.g. simple user/usage questions).

Additionally, we try to keep each GitHub Issue focused on the problem/issue at hand, and try hard to not let the discussion stray from the original topic. All of Hercules's various support forums (we have many!) are documented on our "Technical Support" web page:

https://sdl-hercules-390.github.io/html/hercsupp.html

I would suggest you ask your question on the primary Hercules emulator support forum. Someone there should be able to help you. (I could help you right here and now but am purposely refraining from doing so because I don't want to set a precedent by doing so.)

Hope that helps!

(P.S. See below!)

FYI: I would very much appreciate it if you would not respond/reply to GitHub Issues via email. (*)

_I would much prefer that you instead respond/reply directly via the GitHub Issues web page itself:_

https://github.com/SDL-Hercules-390/hyperion/issues

_When you reply directly via their web page, I can make minor edits to your reply so it is more readable (prettier) by editing the fonts being used, formatting the log messages, etc._

When you reply via email however, I cannot edit your reply (GitHub does not allow it), so oftentimes it is much harder (more difficult) to read.

It is up to you whether or not you want to take the time to reply via their web page or continue to reply via email, but I would much rather that you reply directly via their web page instead.

Thanks

(*) GitHub does not support formatting of email replies, making it impossible for me to fix the formatting of a person's reply for readability. Thank you for understanding.

Fish-Git commented 9 months ago

Well, James Wekel has only just today managed to complete the one final remaining instruction listed in this GitHub Issue, so now between Bob Polmanter and James Wekel, ALL of the instructions listed have had their performance improved quite dramatically from how slow they were executing before! THANK YOU, you guys! You'er the best! :))

Closing issue as completed! :))

SDL-Hercules-390 / hyperion

Performance of some instructions could be significantly improved #101

Writing Solid Code

by Steve Maguire