Transactional-Execution Facility design / implementation

Fish-Git commented 5 years ago

NOTE: This issue has been closed and is now being continued in a NEW GitHub issue, #339: "Transactional-Execution Facility... (continued) "

I have been told the recently released z/OS 2.4 requires the availability of both the Transactional-Execution Facility and Constrained-Transactional-Execution Facility in order to successfully IPL:

Transactional-Execution enforcement

z/OS V2.4 provides enforcement to the effect that hardware Transactional-Execution, available on IBM Z servers since zEC12, is always available to programs running on z/OS V2.4 so that programs in such an environment can unconditionally assume and make use of Transactional-Execution.

https://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_ca/0/877/ENUSZP19-0410/index.html

This GitHub Issue is being created so that we can, together, discuss how best to implement this facility. Please offer your suggested approach/design as a GitHub comment reply to this issue.

I myself have a vague idea of how maybe it might be implemented, but I don't know if it will even fly nor especially how good it is. It's entirely possible (even likely!) that one of you might be able to come up with a better idea. (Please?)

Besides the description of the facility in the Principles of Operation manual, here are some additional papers I found on the subject to help get your creative juices flowing:

Transactional-Execution Facility.zip

I've also assigned everyone to this issue because I really, really want everyone to contribute with their own thoughts/ideas on how is the best way to implement this facility in Hercules.

Thanks!

_NOTE: This issue should be considered a specific sub-issue of issue #77 "MISSING Facilities support".__

EDIT: The following items still remain to be done:

(Feel free to add additional items as needed)

[ ] FORMAL TESTING! We still need a comprehensive set of tests (preferably standalone runtests) to verify proper functionality of all aspects of TXF! Right now we're simply using z/OS and z/VM and just presuming it's working correctly as long as both operating systems appear to function "normally", but of course that is not good enough!
[x] 'txf' tracing
[x] #define DEBUG tracing (TRACE macro)
[x] PTT tracing (PTT_TXF ==> PTT( PTT_CL_TXF ...)
[x] Constrained transactions constraint: 2. "All instructions in the transaction must be within 256 contiguous bytes of storage, including the TRANSACTION BEGIN (TBEGINC) and any TRANSACTION END instructions." (page 5-107)
[x] Constrained transactions constraint: 4. "The transaction’s storage operands access no more than four octowords. Note: LOAD ON CONDITION and STORE ON CONDITION are considered to reference storage regardless of the condition code." (page 5-109)
[ ] Constrained transactions constraint: 5. "The transaction executing on this CPU, or stores by other CPUs or the channel subsystem, do not access storage operands in any 4 K-byte blocks that contain the 256 bytes of storage beginning with the TRANSACTION BEGIN (TBEGINC) instruction." (page 5-109)
[ ] Constrained transactions constraint: 7. "Operand references made by each instruction in the transaction must be within a single double-word, except that for LOAD ACCESS MULTIPLE, LOAD MULTIPLE, LOAD MULTIPLE HIGH, STORE ACCESS MULTIPLE, STORE MULTIPLE, and STORE MULTIPLE HIGH, operand references must be within a single octoword." (page 5-109)
[ ] PER as it relates to TXF. (pages 4-26++, 5-89++)
[x] SIE as it relates to TXF. (Refer to GitHub Issue Comment below for some details)
[x] FPCR update on abort: "In addition to the diagnostic information saved in the TDB, when a transaction is aborted due to any data-exception program-exception condition and both the AFP-register control, bit 45 of control register 0, and the effective allow-floating-point-operation control (F) are one, the data-exception code (DXC) is placed into byte 2 of the floating-point control register (FPCR), regardless of whether filtering applies to the program-interruption condition." (page 5-97)

Note that constraints #4 and #7 seem to contradict one another. One says four octowords (4x32=128 bytes) whereas the other says a single double-word (8 bytes). Unless #4 means four octowords in total?? Constraint 5 is going to be next to impossible. Not sure about 7.

ivan-w commented 4 years ago

Ok.. Let's say a SIE guest performs a store while within a transaction... Your local copy of the TLB will indicate a virtual address... You need to ensure that no other subsystem is accessing it (to raise the conflict abort) - this can only be done at the HOST level!

The transaction memory access needs to be performed at the lowest level/host level - so it now becomes independent on whether you are running SIE or NOT!

If you are running SIE, SIE gets an intercept due to the abort condition. If not running SIE, the abort condition gets reported directly.

When a transaction is committed it needs to be made visible atomically for the ENTIRE complex - and that can only be done at the lowest level (SIE or NOT SIE - it is irrelevant).

GRSM stuff is specific to whether you are in SIE or not, and how it is handled is dependent. (In SIE mode, the GPRS need to be put back, without SIE they just go in REGS)..

In SIE mode the PSW for the restart needs to be put in the SIEBK... for non SIE mode it goes in host regs...

Fish-Git commented 4 years ago

Can you be more specific about how your z/OS 2.4 system behaves differently?

Well, the first thing that's different is when I IPL it asks for IPL options. A reply of just r 0 (which I believe means "use the default options whatever they are") gets past it. After replying to that it immediately displays a bunch of stuff and complains about the (page file?) being in use, requesting a "deny" or "continue" response (to which I always reply r 0,continue).

It then proceeds to IPL and bring up the system. But of course since I don't have a full system not everything "comes up". I see a several crashes occurring (can't remember off hand which ones) and some 01 message about something which appears to be auto-replied to later on.

Later, an 02 message appears asking me for JES2 options which I have no idea how to reply to. So I don't. I just leave it on the screen.

The system seems to chug along for a while until it eventually reaches a point where all 6 processors (I have my Hercules z/OS config defined with 6 CPUs) are all sitting at about 17% (which is 100% divided by 6) with the system banging away doing all kinds of I/O to the SMF volume. That's where I usually stop the system and power off. (I've let it sit there before for many, many minutes, but no further progress appears to be made. So like I said when it reaches that point I consider the IPL/test to be complete and stop the system and power off.)

Needless to say this behavior is obviously very different from the way z/OS normally behaves during IPL and startup. :)

And as I said I suspect it's the partial/incomplete (and possibly partially customized?) minimal system I was loaned to test with. I don't know z/OS but after having IPLed several different versions of it hundreds of times over the years as a Hercules developer, I know what appears to be a normal IPL/startup and what doesn't, and I have to tell you what I'm seeing definitely doesn't appear to be a normal IPL/startup! ;-)

Can you send me your changes?

Absolutely! Just let me package up what I've got and I'll send it to you.

I will test that on my [z/OS 2.4] system. I will also set up a test of running z/OS 2.4 under z/VM 6.4 and let you know the results.

I would appreciate that very much! Thanks!

I had not gotten to channel.c yet. As I stated earlier, it is a work in progress.

Understood. But no worries. I've already got it taken care of in my version. :)

Also, do you have the coding standards documented? I would be happy to comply with standards.

Not really, no. The only hard standard we have is:

NO TABS. All tabs should be spaces instead.
The tab setting should be set to 4.

That's pretty much it. <shrug>

As far as brace usage is concerned I prefer them to be on separate lines but others don't. Either way is fine. Just follow the style already being used in the code seems to be the rule.

Jagged comments are ugly! Try to take the time to line them up. Liberal use of blank lines to break up code into smaller more easily digestible chunks makes it easier (IMO) for one's brain to absorb what the code is doing/accomplishing. But then that's probably me.

Other style issues (such as coding a function call as: func( foo, bar ) rather than func(foo,bar), i.e. using blanks here and there for clarity) are all personal style issues as well. Others don't care for it whereas I do. Anything that makes the code easier to read and understand and maintain is good.

Another thing I personally prefer is to keep if statements separate from other statements, such as assignment statements that precede or follow it. But again, that might just be me. It's not a hard rule. NONE of what I've written above (except the tab issue) are hard rules.

OH YEAH! One other thing that I should probably add to the above HARD STANDARD is the STRONG discouragement of using hard coded values! #define the value with a name instead and use that! PLEASE!

Using hard coded values makes the code much more difficult to maintain and opens the door to the introduction of subtle hard to find bugs! I had all kinds of trouble coding CCKD64 support due to all of the hard coded values that Greg had sprinkled throughout his code! Your use of hard coded numeric values in your program_interrupt code in cpu.c is a good example of how NOT to code! :)

That's pretty much it I guess. <shrug>

Hope it helps!

p.s. Except for your coding style I have to say that, overall, I am very impressed with your overall design/implementation of TXF! Well done and THANK YOU! Much appreciated my friend. You're proving to be quite a valuable(*) team member. :))

(*) All of our team members are quite valuable, not just you of course! But as a relative newcomer to Hercules development I have to admit you're not doing half bad!

rwoodpd commented 4 years ago

The reply is either due to incorrectly specified load parms (probably not the case), or something not quite right in SYS1.IPLPARM. You probably want to reply COLD to the JES2 WTOR. Not sure about the crashes. If you want, I can send you a one pack z/OS 2..4 system that IPLS OK on my system. It might be too big to attach here, so I will set it up on a FTP site. It would be a simple zip of the AWS file and the config file I used. I will also supply logon id, pswd, etc.

Fish-Git commented 4 years ago

Can you send me your changes?

Here you go:

(Be sure to read the READMEs!)

Fish-Git commented 4 years ago

Bob wrote:

The reply is either due to incorrectly specified load parms (probably not the case), or something not quite right in SYS1.IPLPARM.

BOTH!

They did say they had customized their SYS1.IPLPARM(LOAD00) member so that it would not work with the original volumes I was given, and was sent a replacement copy I was told to replace it with. That's what I'm using: the SYS1.IPLPARM(LOAD00) they sent me that they said I should use. But I don't think it's right either.

You probably want to reply COLD to the JES2 WTOR.

Thanks. I'll try to remember to give it a try next time.

If you want, I can send you a one pack z/OS 2..4 system that IPLS OK on my system.

Would this be the SARES1 volume? If so, I was under the impression that emergency stand-alone volumes (such as what I believe SARES1 is) were purposely absolute minimal systems providing only enough of a working system to allow you to fix whatever caused your real system to break. As such, I'm doubting it uses TXF at all! Have you verified that? Have you verified that the SARES1 volume actually uses TXF?

Regardless, I'll pass for now. If push comes to shove I'll try to get my original benefactor to send me a complete set of unmodified/uncustomized volumes (if he still has them). Besides, as I said, I think that MAYBE I might have a way of testing on my own without using z/OS at all. I just need some time to think about it some more and then fling together the code. It'll take me a while but I think it'll be worth it in the end. Having something of our own we can turn into a runtest test to verify (as best we can!) the proper functioning of our code is something we really, REALLY need to have. Such is critical IMHO to the success of this project, given its importance and complexity.

Fish-Git commented 4 years ago

Ivan wrote:

Ok.. Let's say a SIE guest performs a store while within a transaction...

Okay...

Your local copy of the TLB will indicate a virtual address...

Yes, the virtual address that was dynamically translated is indeed a part of the TLB. So?

You need to ensure that no other subsystem is accessing it (to raise the conflict abort)

What is "it"? The virtual address? Or the real address that DAT translated that virtual address to? (which is also what is saved in the TLB)

Please remember that TXF operates on absolute addresses (which, ignoring prefixing for the moment, is the same as the real address that DAT logic (MADDRL macro and maddr_l function) returns to the caller, which is a HERCULES MAINSTOR ADDRESS. You know? Mainstor? That chunk of memory we malloc at impl time that represents (emulates) a real mainframe's REAL [absolute] STORAGE? ALL of it? The storage that all CPUs in the complex access?

- this can only be done at the HOST level!

And it is!

Look, SIE works just fine today, does it not? Forget about TXF for the moment. Think just SIE as it works today. SIE guests can store into memory just as the SIE host can store into memory, and whether you're an SIE guest or the SIE host, the "store" that occurs eventually has to update the real system's -- the actual mainframe's -- actual real storage does it not? That's known as Hercules mainstor.

And all that works JUST FINE today without TXF, does it not?

So what's your beef? Where's the problem? Hercules's DAT logic (MADDRL maro --> maddr_l function in dat.h) returns a guest (Hercules guest) real address to the caller, which is a Hercules mainstor address to us (i.e. to Hercules). It is that address that TXF operates at. At the actual, real, emulated storage level. At the emulated hardware level. The emulated mainframe's real storage level.

Forget about SIE. It's a red herring. There should be no change needed to SIE. It works fine. It accesses/updates the system's real storage just fine. Seamlessly. Just as always. (If it didn't z/VM wouldn't work at all!)

The transaction memory access needs to be performed at the lowest level/host level

Correct! And it does! Have you even bothered to look at Bob's code to see how it works?

- so it now becomes independent on whether you are running SIE or NOT!

WRONG! Very, very wrong!

I can see absolutely no need for any SIE considerations at all. None.

Carve or burn this into your forehead until it sinks in:

TXF operates at the Hercules mainstor level. SIE hostregs/guestregs is unimportant. If today's DAT logic -- whether called in SIE guest mode or SIE host mode, it matters not -- can manage to return the correct real address to the caller, and as long as it is only that real address that TXF operates at, then TXF should work just fine.

I realize that's a lot to carve onto or burn into such a tiny forehead as yours (heh! just kidding of course!), but please do try. It's important. ;-)

If you are running SIE, SIE gets an intercept due to the abort condition. If not running SIE, the abort condition gets reported directly.

Yeah, so?

When a transaction is committed it needs to be made visible atomically for the ENTIRE complex - and that can only be done at the lowest level (SIE or NOT SIE - it is irrelevant).

CORRECT! And as I said, it does! Take a look at Bob's code, please! I beg you! Once you do you'll see that there are really no SIE considerations at all! None. TXF operates at the hardware storage level (i.e. the emulated hardware storage level of course, i.e. at the mainstor level).

GRSM stuff is specific to whether you are in SIE or not, and how it is handled is dependent.

Wrong again! (Sheesh!)

Look, Hercules's emulated General Purpose Registers are managed just fine today, are they not? When an instruction is executed that modifies one of the registers, we update regs->GR(n) accordingly, not caring even one iota whether that instruction is being executed while in SIE mode or not. And all of that works fine today. SIE mode or not.

(In SIE mode, the GPRS need to be put back, without SIE they just go in REGS)..

I'm not sure what you mean by that but I'm sure it's wrong. ;-)

(Heh! I'm having fun. How about you? I love arguing with you. I almost always learn something in the end and we end up passing along [what I hope is] valuable information to the other developers, so it's almost always a very productive (and good natured!) argument, which I find to be extremely cool.)

In SIE mode the PSW for the restart needs to be put in the SIEBK... for non SIE mode it goes in host regs...

Nope. The PSW that gets updated (during transaction abort which is what I presume you're referring to?) is whichever one regs points to. PERIOD. Just like the old/new PSWs that get saved/fetched from low core when, say, an I/O or Program interrupt occurs. Just like today. No change needed. No special SIE considerations needed. SIE handles all of that just fine today so there's nothing additional we need to do for TXF. None. Just use regs and we should be good.

Fish-Git commented 4 years ago

Fish wrote:

Bob wrote:

Additional code does need to be added in the abort logic. I will work on that also.

What "SIE-specific" code do you intend to add to the abort logic? Because in my mind, none of the TXF code needs to know anything about SIE. Why should it care?

Hmmm... You may be right, Bob. (@rwoodpd)

You too Ivan! (@ivan-w)

After taking a nap and sleeping on the SIE and hostregs/guestregs issue we've been arguing about, I just woke up realizing that the code in the maddr_l function that checks for store/fetch conflicts with other CPUs *`()`** does indeed need to be checking both hostregs and guestregs! (Oops!)

So I was wrong. There are "SIE considerations" that need to be taken into account! (kind of, sort of)

I apologize to both of you. Mea culpa! :(

(grumble..)

Let me make that change and re-"test" to see how much of a difference it makes...

(See? I told you I'd probably eventually learn something! That's why I like arguing with you Ivan! You keep me on my toes!)

*`()** Where it'sforlooping throughsysblk.regs[i]from 0 tosysblk.hicpu`.

Fish-Git commented 4 years ago

@rwoodpd:

Fish wrote:

Let me make that change and re-"test" to see how much of a difference it makes...

FYI: It didn't seem to make any difference -- for me. But that's probably only because I'm testing with z/OS (which I don't think uses SIE). I suspect it will (would) make a difference for z/VM though.

Also, FYI #2:

Fish wrote

Bob wrote:

Can you send me your changes?

Here you go:

Source code

Prebuilt Windows binaries

(Be sure to read the READMEs!)

The above .zip files have been updated and re-uploaded with my latest changes, so you might want to re-download them. The only real change was to the transact.c source file of course (to check both hostregs and guestregs), but I of course also had to update the corresponding patch/diff files and resulting binaries too.

p.s. There's still a race condition somewhere with sysblk.txf_transcpus. Sometimes (but very rarely) I'm crashing in UPDATE_SYSBLK_TRANSCPUS. I can't say how often it occurs, but it has happened a couple of times. Maybe once every ... 20-30 test runs? Maybe? I'm thinking we should get rid of it (sysblk.txf_transcpus) altogether. What do you think?

s390guy commented 4 years ago

I have been following this discussion for a few days now with much interest. Fish has it right about the way TXF works with regards to Hercules mainstor.

There seems to have been a resolution to the register concerns, so I will not comment on that.

Hercules lacks a standard storage management system. While instruction execution is consistent with how it accesses memory, the channel system is not. Because absolute addresses are used by it, each device driver uses its own way to access memory. This is going to make it very difficult to implement TXF for I/O devices as it is done on physical mainframe systems. How is a conflict detected without a centralized way of detecting it??

Fish, if my document is of any value, please share it with the developers working in TXF.

dasdman commented 4 years ago

s390guy wrote:

While instruction execution is consistent with how it accesses memory, the channel system is not. Because absolute addresses are used by it, each device driver uses its own way to access memory.

On the contrary, it is the channel subsystem in Hercules that moves the data between the channel buffer and main memory. There are only a couple of locations that should be touched that are responsible for the actual memory moves within channel.c. Nothing else in channel.c needs to be touched, nor should it be touched.

Mark

Fish-Git commented 4 years ago

(combined reply)

s390guy wrote:

Fish has it right about the way TXF works with regards to Hercules mainstor.

Thank you, Harold.

Hercules lacks a standard storage management system.

That is true for the most part, yes. However, the way Bob designed his TXF implementation, our "standard storage management system" (at least as far as TXF is concerned) can now be thought of as being the dat.h maddr_l function (invoked by the MADDR and MADDRL macros), which is where the TXF conflict checking code lives.

And since virtually all guest storage access must go through either one of the vstore macros/functions in vstore.h or else directly themselves via the MADDR/MADDRL macros (which all of the vstore macros/functions call), I think we're good. I looked at all of that pretty closely and unless I missed something it looks pretty solid to me.

dasdman wrote:

s390guy wrote:

While instruction execution is consistent with how it accesses memory, the channel system is not. Because absolute addresses are used by it, each device driver uses its own way to access memory.

On the contrary, it is the channel subsystem in Hercules that moves the data between the channel buffer and main memory. There are only a couple of locations that should be touched that are responsible for the actual memory moves within channel.c. Nothing else in channel.c needs to be touched, nor should it be touched.

Thank you, Mark. You are of course correct. For those interested (e.g. @s390guy Harold), if you take a look at my cleaned up version of Bob's code, you'll see some TXF_FETCHREF and TXF_STOREREF macros sprinkled here and there at key places throughout channel.c code. As best as I was able to determine these are the only places where "channel" code accesses main storage. The device handlers only read or write data into or out of the I/O buffer that channel.c passes to it (which it manages externally from Hercules main storage/mainstor). The device handlers thus never actually directly access any main storage; only the channel.c code does.

It is the channel.c copy_iobuf function that is actually responsible for moving device I/O buffer data to/from Hercules main storage (mainstor). So again, I think we're good.

Thank you both for your excellent thoughtful feedback.

Fish-Git commented 4 years ago

@rwoodpd

*** (FRICK!) ***

HOUSTON, WE HAVE A PROBLEM!

I screwed up folks. My "bobclean" version of Bob's code is not exactly cosmetic. (frick!)

In the opcode.h source file I guarded his "CHECK_TRANCTR" macro definition with a #ifdef for "FEATURE_073_TRANSACT_EXEC_FACILITY". Bob's original code wasn't doing that.

The #ifdef should be for _"_FEATURE_073_TRANSACT_EXEC_FACILITY"_, not "FEATURE_073_TRANSACT_EXEC_FACILITY"! (i.e. underscore feature, not just plain feature!)

The impact of this f*ckup on my part is that my "bobclean" version is missing his "CHECK_TRANCTR" macro in the "EXECUTE_INSTRUCTION" macro expansion. The CHECK_TRANCTR statement is there, sure, but it gets expanded to nothing because the #define definitions for both the CHECK_TRANCTR and the EXECUTE_INSTRUCTION macros are in the Architecture INdependent section of opcode.h!

(Oops!)

The first part of opcode.h (everything before approximately line 716, where the "#endif /*!defined( _OPCODE_H )*/" statement is) gets compiled only once, and all macros #defined in that section get used for all build architectures.

All architecture DEPENDENT macros are #defined after line 716, and are dependent on the architecture currently being built (s370, s390, z900). In that section your #ifdefs should test for the normal plain non-underscored FEATURE_XXXX name, whereas #ifdef tests in the architecture INdependent section (everything before line 716) should always check for the underscored _FEATURE_XXX name instead.

LUCKILY the impact of this little SNAFU on my part is, I believe, relatively minor. This time. Next time it might not be, but I believe this time the impact is fairly minor since the missing TXF code (the TXF code that's not being generated before each instruction as a result of my boo-boo) isn't very critical IMO. Generally speaking, sure! It's critical! It definitely needs to be there. But... I'm doubting either z/OS or z/VM really cares. I'm guessing neither is relying on either one of the two constraints/features that the CHECKTRANCTR macro is checking for. So like I said, I think I got lucky this time. Next time I might not be so lucky. I'm just glad this was caught early before it made its way into production. (This is what happens when you review your own code a zillion times looking for problems, questioning everything.)_

WHICH BRINGS ME THE FOLLOWING MUCH MORE IMPORTANT ISSUE!

After spending a few moments longer to look at opcode.h a bit closer, it appears there is quite a bit of code (#defined macros) in the architecture independent section that technically needs to be moved to the architecture dependent section instead!

For example, I believe all of the following code:

https://github.com/SDL-Hercules-390/hyperion/blob/9007f6a90f2661a08afc149cd7d804fe5b3ed52d/opcode.h#L283-L439

should all be moved into the architecture dependent section of opcode.h instead:

https://github.com/SDL-Hercules-390/hyperion/blob/9007f6a90f2661a08afc149cd7d804fe5b3ed52d/opcode.h#L688-L695

As justification for why I feel they need to be moved, the _PSW_IA macro for example is casting to VADR which is architecture dependent, the PSW_IA macro is using ADDRESS_MAXWRAP which is architecture dependent, the SET_PSW_IA macro is using psw.IA and IA is architecture dependent, etc. Quite a number of them are also using PAGEFRAME_PAGEMASK too, which is also architecture dependent. Thus all of them IMHO should be moved to the architecture dependent section of opcode.h instead.

Would each of you concur (agree) with my assessments to both my TXF mistake and this opcode.h archdep/non-archdep issue I've discovered?

Feedback encouraged and appreciated!

Thanks.

(and my apologies to Bob!)

Fish-Git commented 4 years ago

FYI:

For those interested, I've re-uploaded my "fishtrans.zip" and "fish-txf.zip" files to my web site:

(Source) http://www.softdevlabs.com/downloads/fishtrans.zip
(Binaries) http://www.softdevlabs.com/downloads/fish-txf.zip

containing the latest set of changes:

opcode.h "archdep" fix discussed above.
OPTION_TXF_SLOWLOOP (speed-up? of run_cpu loop)
dyn76 obtain_lock workaround removed.

Please replace whatever version you may have previously downloaded with this latest and greatest and re-review and re-test.

Feedback appreciated but not required.

Thanks!

Fish-Git commented 4 years ago

FYI:

I'm currently working on my full report and analysis of my cleaned-up(?) and fixed(?)/improved(?) version of Bob's code that I promised to do earlier:

... once I'm done (a few more days/weeks?) I'll be posting my full report and analysis for peer review.

I hope to be done within the next few days -- at most -- at which time I will then post the download links here. (*)

(*) I figure rather than post a huge report here on GitHub it'd be better/easier for all involved for me to simple publish some download links to a .rtf/.pdf file instead.

Fish-Git commented 4 years ago

Apologies to Harold for forgetting to reply to this:

Fish, if my document is of any value, please share it with the developers working in TXF.

FYI: Everyone should have already received a copy of it included in amongst the attachment that I included at the very beginning of (the very first comment of) this GitHub issue:

Besides the description of the facility in the Principles of Operation manual, here are some additional papers I found on the subject to help get your creative juices flowing:

Transactional-Execution Facility.zip

My apologies to everyone else as well for my posting so many GitHub replies, making it seem like I'm purposely monopolizing the conversation/discussion here. It's purely unintentional I assure you! I'm just trying to make sure things which I feel are important aren't left unmentioned.

Fish-Git commented 4 years ago

Ivan wrote way back on Dec 1, 2019:

(Yes, I know there is no PIFC for TBEGINC since it doesn't have a TDB.)

I don't believe the TDB has anything to do with controlling Program Interrupt filtering. Program Interrupt filtering is controlled via the PIFC field of the i2 operand of the TBEGIN instruction (which is ignored for the TBEGINC instruction (as is the F Allow Floating-Point Operation field too)).

Besides, unless I'm misunderstanding/misreading things, constrained transactions (TBEGINC instruction) do have a TDB (sort of). It's just not specified in the instruction itself like it is for TBEGIN. Rather, the TDB is located at fixed storage location 0x1800 in low core. But of course it's only used if the transaction is aborted due to a Program Interrupt which, technically, should never happen for constrained (TBEGINC) transactions (but which our code allows for). (*)

(*) That is to say, when a transaction is aborted due to a Program Interrupt, the low core TDB is populated whether it was a constrained or unconstrained transaction that aborted. If it was a constrained transaction it is of course automatically retried (by virtue of the Transaction Abort PSW pointing to the TBEGINC instruction itself rather than to the instruction following the TBEGIN instruction like it is for unconstrained transactions).

Just wanted to make that clear.

Fish-Git commented 4 years ago

I'm currently working on my full report and analysis of my cleaned-up(?) and fixed(?)/improved(?) version of Bob's code that I promised to do earlier: [...] I hope to be done within the next few days -- at most -- at which time I will then post the download links here. (*)

Done! Didn't take me as long as I thought.

fish-vs-bobclean.pdf

It's not formatted very well, but I believe it covers everything.

Let me know what you think.

Fish-Git commented 4 years ago

FYI: ExamDiff Pro diff reports:

TXF-diff-reports.zip

rwoodpd commented 4 years ago

There are indeed two TDB blocks when a program interrupt occurs, filtered or unfiltered. I verified this on real hardware,

rwoodpd commented 4 years ago

Of course this is only true for unconstrained transactions where the caller passes the TDB address.

Fish-Git commented 4 years ago

Folks:

I have created a clone of current SDL Hyperion under my GitHub account (Fish-Git) called: "hyperion-fish-txf", and have added all of you as collaborators so that we all have commit access to it.

I started with current SDL Hyperion as/of 2019-12-31 commit 9007f6a90f2661a08afc149cd7d804fe5b3ed52d ("cosmetic: untab machdep.h") followed by a single commit of Bob's "raw" (original) code.

From there I made a long series of tiny incremental commits to eventually reach what I call the "bobclean" state: my (hopefully purely cosmetic!) cleaned-up version of Bob's original code.

Finally, from there, I then made a series of commits that I'm calling "fishfix" that are my own changes (modifications/fixes) to Bob's cleaned-up original code. That's where the repository is at now (but it does have a few changes I've made in the recent past few days that I haven't told you guys about yet).

I'm hoping we can use this repository to, TOGETHER, further develop Bob's impressive TXF implementation, to eventually be merged into SDL Hyperion once we're all happy with it.

This way I don't have to keep publishing my fishtrans.zip and fish-txf.zip and associated patch files each time I change something. Instead, we can ALL commit whatever changes we feel are needed to my "hyperion-fish-txf" repository instead, and then eventually merge all of those changes into our "production" SDL Hyperion repository once we're done.

Cool?

rwoodpd commented 4 years ago

I just brought up z/OS 2.4 using your version of the code. Hercules is crashing somewhere in the UPDATE_SYSBLK_TRANCPUS macro. I am looking into it

rwoodpd commented 4 years ago

Also the transaction level is 2 at this point, which is wrong.

rwoodpd commented 4 years ago

If I change regs to hostregs, everything works. Looking into why

rwoodpd commented 4 years ago

It looks like the problem is due to the txf fields in the REGS structure being after regs_copy_end, so that when copies of the regs structures are populated using sysblk.regs_copy_len, the transaction data is left behind. I moved the txf fields to before regs_copy_end, changed back to regular regs, and it is working now. Next up is to test running z/OS 2.4 as a z/VM guest. Will keep you posted

Fish-Git commented 4 years ago

Bob Wood said:

It looks like the problem is due to the txf fields in the REGS structure being after regs_copy_end, so that when copies of the regs structures are populated using sysblk.regs_copy_len, the transaction data is left behind. I moved the txf fields to before regs_copy_end, changed back to regular regs, and it is working now. Next up is to test running z/OS 2.4 as a z/VM guest. Will keep you posted

You need to clone my "hyperion-fish-txf" repository and use that instead. The re-positioning of the txf_xxxx fields to come before regs_copy_end (instead of after, where you originally had them) was already fixed. Just earlier today as a matter of fact, just before I posted my GitHub comment announcing the availability of my hyperion-fish-txf fork of SDL Hyperion:

_TXF: fishfix: move TXF fields in REGS before 'regs_copy_end':_ _Last few crashes have been in 'program_return' instruction, which appears to be the only one that copies REGS according to SYSBLK.regs_copylen. When a copy of REGS is made, I think the TXF fields need to be copied too!

That's why I'd like everyone to begin using this fork of mine to perform their TXF testing with. It's the most current version of TXF code available, with all of your original code (cleaned up of course) plus all of my various fixes too.

Thanks.

Fish-Git commented 4 years ago

FYI to everyone:

https://github.com/Fish-Git/hyperion-fish-txf/issues

Created first "Issue" just a few minutes ago.

Comments?

Fish-Git commented 4 years ago

Bob Wood said:

There are indeed two TDB blocks when a program interrupt occurs, filtered or unfiltered. I verified this on real hardware,

Of course this is only true for unconstrained transactions where the caller passes the TDB address.

Yeah, I mentioned that in my analysis and then went ahead and fixed it, so my repo already has the fix.

Fish-Git commented 4 years ago

On Jan 5, 2020 Bob Wood said:

As I stated earlier, it is a work in progress.

And I'm only now noticing how much work still needs to be done!!

FOLKS, WE HAVE A PROBLEM :(

According to the manual (page 5-97: "Restricted Instructions"):

Restricted instructions include all instructions that are not defined in Chapters 7-9 and 18-26 of this document ...

Which means ALL of the instructions in the following chapters are restricted:

Chapter 10: Control Instructions
Chapter 14: I/O Instructions

ADDITIONALLY, note 3 regarding "Constrained Transactions" (pages 5-107 and 108) states:

In addition to all instructions listed in the section "Restricted Instructions" on page 5-97, the fol- lowing restrictions apply to a constrained trans- action.

a. Instructions are limited to those defined in Chapter 7, "General Instructions."

HOLY CRAP!!

What this basically means is that virtual every single freaking instruction in z/Architecture (except for the small handful in the chapter 7 "General Instructions" category, about 240 in all) is restricted! That is to say, virtually ALL OTHER instructions need to have a CONTRAN_INSTR_CHECK macro in them, and all instructions in chapter 10 and 14 need to have a TRAN_INSTR_CHECK macro.

And I'm not seeing that. :(

I'm seeing _some (all of the I/O instructions in io.c do correctly have a TRAN_INSTR_CHECK macro for example),_ but even for those, some of them don't appear to be correct. For example, the DSG and DSGF instructions in general3.c both have CONTRAN_INSTR_CHECK, which is wrong. Both are chapter 7 General Instructions, which are allowed for Constrained Transactions! Why are we disallowing them?

THE BIG PROBLEM THOUGH, is the missing CONTRAN_INSTR_CHECK in virtually all other instructions, as well as the missing TRAN_INSTR_CHECK macro in all of the chapter 10 Control instructions in control.c.

Like I said, I fully realize, Bob, that you did state your implementation was "a work in progress". I understand that. I'm not blaming you for this. I'm not. It's not your fault or anyone's fault, really. It's just something that I only now noticed (realized) and felt I should bring to everyone's attention.

So all I'm really saying is that we still have a long row ahead of us with this project, folks. :(

A long one.

(sigh!)

Well, I guess I better get started on reviewing all of our defined instructions (DEF_INST) to make sure they all (except for the chapter 7 General Instructions of course) properly have either a TRAN_INSTR_CHECK or CONTRAN_INSTR_CHECK macro in them (which I'm thinking of maybe renaming to TXF_xxxx_CHECK and `CON_TXF_xxxx_CHECK for consistency).

wably commented 4 years ago

Following along here, of course...

A question I have about your latest post, Fish, is whether the rest of us that are not running the latest z/OS or z/VM will have to pay an overhead price within the emulator in support of this feature? Since you mentioned that a huge number of instructions will now need additional checking macros for possible transaction conditions, that's what brought this to mind. Is nearly every instruction now going to have to pay the price of a test (or two or three) every single time it is executed?

Regards, Bob

rwoodpd commented 4 years ago

The checks are there in my version of the code for all instructions.

Fish-Git commented 4 years ago

Bob Polmanter asked:

A question I have about your latest post, Fish, is whether the rest of us that are not running the latest z/OS or z/VM will have to pay an overhead price within the emulator in support of this feature?

Since you used the term "the latest" I'm presuming what you're asking is whether older versions of z/Architecture operating systems -- which don't use/require TXF -- are going to be impacted by our TXF support, and the answer to that question is, unfortunately, yes. :(

Whether your z operating system -- whether it's e.g. z/OS 1.10 or MyOwnWhizBangSuperZOpSys! -- requires or uses the Transactional-Execution Facility is immaterial. It will, unfortunately, be impacted. There will be a slight(?) performance hit across virtually all instructions in the architecture. (Well, all those that access storage for sure, but even those that don't will be impacted if they're amongst those which are "restricted" as far as TXF is concerned, which is virtually all of them except the chapter 7 General Instructions.)

Now 370 and 390 will of course not be impacted in any way, shape or form. TXF is only a z architecture feature. Both 370 and 390 will continue to run at the same speed as they do today. There should be ZERO performance impact on either of those two architectures.

But for z? Yep. Performance hit. Across the board. And there's really no way around it, unfortunately. I mean, think about it for a moment. TXF needs to come into play on every storage fetch or store! So our DAT logic, after translating a virtual address to a real address (or yanking the previously translated value from the TLB), must check to see whether what it is fetching or storing impacts any currently running transaction on another CPU. That's a requirement.

Now we're currently attempting to reduce the amount of overhead (the severity of the performance hit) to the absolute minimum by maintaining a count of the number of CPUs in the system that are currently in transactional execution mode (executing a transaction) and then, after translating an address, checking that count for a non-zero value, and if zero, simply returning the translated address to the caller (you can see this at the beginning of the transact.c txf_maddr_l function, called by the very end of our DAT address translation function maddr_l in dat.h), so the overhead in this case reduces to a single compare and branch test. But... if there are any CPUs -- any at all -- currently executing a transaction, then there is significant overhead involved during address translation. Significant. (And again, I can see no way around it.)

There is also going to be a tiny bit of additional overhead incurred (another single compare and branch test) by virtually EVERY SINGLE INSTRUCTION IN THE ARCHITECTURE (except the General Instructions in chapter 7) to deal with the imposing of TXF restrictions regarding certain instructions being disallowed while in transactional execution mode (and again there's no way around that either).

So yeah, unfortunately there is most definitely going to be a performance hit across the board for z/Architecture. No two ways around it. Hopefully we'll be able to reduce the hit to the absolute barest minimum in most cases (single compare and branch test), but unfortunately, if even one single CPU in the system starts a transaction, ALL of the other CPUs on the system are going to immediately begin taking a significant performance hit while that CPU's transaction is running.

Once the transaction ends, then the performance hit goes away (or rather reduces back to the single compare and branch test).

But while any CPU is running a transaction, ker-PLUNK! Immediate performance hit. Immediate precipitous performance hit.

And there's no way around it! Never enabling the facility (bits 73 and 50) is not going to help you any. The only way you can get around the "problem" (if you want to call it that), is to build yourself a special build of Hercules without TXF support (i.e. without either FEATURE_050_CONSTR_TRANSACT_FACILITY or FEATURE_073_TRANSACT_EXEC_FACILITY #defined feat900.h. That's the only way.

Sorry! :(

(But like I said, the performance hit should be tiny/minimal if the z operating system you're running never uses the TXF facility, but even if it doesn't there will unfortunately still be a wee performance hit nonetheless.)

Fish-Git commented 4 years ago

Bob Wood said:

The checks are there in my version of the code for all instructions.

That's good to know.

What isn't good to know is they're not in my copy of your code, so I'm going to have to add them, which is going to be a PITA. %-P

UNLESS... by "my version of the code" you mean your clone of my repo? (hyperion-fish-txf)

If so (please say yes), then by all means please commit and push your changes!

If however, you mean your own private (original?) txf code that you're still developing and testing with, then is there any way I can manage to convince you to send me all of the changes you've made to your code since you first released it? (So that I can then make the same changes to my hyperion-fish-txf repo that we're all supposed to be using?) Thanks!

I'd really prefer that everyone (including you!) use my hyperion-fish-txf repo -- and only that code -- to test/develop TXF with. As I explained when I announced its creation, I'd like to use it as our temporary "under development" version of Hercules so as to not impact our users currently relying on the stability of our current repository version (SDL Hyperion). This way once it's finished, once we're done (I was hoping we could all work on this together), we can then simply "git merge" all of our collective changes (commits) into our current repository, rather than have to build one huge patch and commit that to our repo as ONE huge commit (which is what we'd otherwise have to do).

That's why I'd rather all of us get on the bandwagon and start using my temporary hyperion-fish-txf fork to do all of our development and testing with.

Cool?

Does that make any sense?

Do others feel this is a sensible/reasonable approach?

rwoodpd commented 4 years ago

I have cloned hyperion-fish-txf to my desktop. I will make the changes and push it back, probably tomorrow (Tuesday) or the day after.

rwoodpd commented 4 years ago

DSG (Divide Single) is restricted for constrained transactions.

rwoodpd commented 4 years ago

control.c changes have been pushed.

moshix commented 4 years ago

Fish wrote:

Bob Polmanter asked:

A question I have about your latest post, Fish, is whether the rest of us that are not running the latest z/OS or z/VM will have to pay an overhead price within the emulator in support of this feature?

... <snip; a long detailed reply re: performance impact>

I have been checking in the last few days how much this extra few compare and branch instructions impact a z/OS 1.13 or z/VM 6.2 system. In short: on my laptop with an i7 CPU it's just not measurable. The impact of those extra few instructions is within the normal variance of benchmarks.

Of course none of these operating systems enabled TEF. I believe the first IBM system with TEF was the z12 (which z/VM 6.2 does support but perhaps without enabling TEF, or perhaps z/VM 6.2 needs to be installed fresh on a TEF-enabled machine, not sure).

In short without TEF in the OS, there is no impact that I can discern, at least on my laptop. I am traveling and can't check on other machines.

I have also been running VM/SP on my laptop and, without proper benchmarking, haven't felt anything different, which makes sense since S370 is not impacted at all.

thanks

ivan-w commented 4 years ago

TXF performance impact on restricted instructions should be minimal.. It's just a TEST/BRANCH case (when in z/Arch, otherwise no code is generated) which in most cases resolves to 0 or 1 CPU cycles on the host.

The load is having to go trough ALL the instructions and adding the checks.. There are a gazillion instructions that are restricted for Constrained instructions.. It is going to be a pain in the neck to do all these guys.

Just my $0.02

wably commented 4 years ago

Moshe and All,

Thanks for your testing to determine the overhead impacts of the TEF changes. I did suspect that the “cost” might only be a simple compare and branch and be mostly not measurable but I wanted to ensure that ‘performance’ was on the minds of those working on TEF.

The thing is, SDL Hyperion is already demonstrably slower than Hercules 3.13. Much of this speed loss is due to improved “correctness” in the ways emulation is being performed. While it is unfortunate to incur the cost it is important for it to operate correctly so the advanced OS’s work properly.

Another part of the speed loss is due to the continual adding of new features that must be supported. Every time some new thing, the “Blah Blah Blah Assist Feature” is added to z/Series and then it ends up in Hyperion there is a cost as a new check is added somewhere to see if the feature is active or not.

This is one reason why I have not pushed for the implementation of the missing PER G feature in the emulator (even though I was the one that opened the bug issue). It is because I know that if this feature were put in place it will exact a cost to almost every single instruction. In this case, it is not worth it for a feature that no one uses and no one apparently misses (since it hasn’t been there all along and no one asked for it).

It is easy to say that TEF only adds a small compare and branch and if it is not active then the overhead is next to nothing. I don’t disagree with that. What is concerning is that it is one test for TEF, and another test to see if Feature X is active, and yet another test to see if Facility Z is active, and so on. The next thing you know, there are ten tests for different things and now the overhead cost is starting to show up.

However – I don’t see a solution or way around it. These things need to be added or Hyperion will not be able to run the latest systems and will slowly become obsolete. It is the price of progress I suppose. You guys are doing good work on TEF and it is amazing it has progressed this far so quickly. It is good to know that you are already aware of the performance impacts and are keeping it in mind.

Regards, Bob

Fish-Git commented 4 years ago

Bob Wood said:

DSG (Divide single) is restricted for constrained transactions

(Urk!) You are of course correct. I now see that it is indeed specifically listed in note 3e on page 5-108. My apologies. I must have been tired. :(

(me: crawls away in embarrassment...)

Fish-Git commented 4 years ago

Bob Wood said:

control.c changes have been pushed.

Thanks!

Fish-Git commented 4 years ago

Bob Polmanter wrote:

... but I wanted to ensure that ‘performance’ was on the minds of those working on TEF.

It is, but only as Job Two. Job One of course is correctness of functionality. Always. It does little good afterall to arrive quickly at the wrong answer. We concentrate first on writing the code that provides the correct answer. Then afterwards we can tweak the code to try and make it faster (which I hope I have via my OPTION_TXF_SLOWLOOP design).

It is good to know that you are already aware of the performance impacts and are keeping it in mind.

Oh it's in mind! Always has been. It's just that it's only a secondary consideration. Correctness first. Performance second. Always.

Fish-Git commented 4 years ago

Bob Polmanter wrote:

This is one reason why I have not pushed for the implementation of the missing PER G feature in the emulator (even though I was the one that opened the bug issue). It is because I know that if this feature were put in place it will exact a cost to almost every single instruction. In this case, it is not worth it for a feature that no one uses and no one apparently misses (since it hasn’t been there all along and no one asked for it).

Yeah, PER G is still on my list, but you are right as to why I haven't bothered to try that hard to get it implemented. It is a deviation from the architecture, yes, but as you state, it's one of those features that is so rarely used and hasn't, thus far, been missed, that it's very low on the list of priorities.

It'll probably get implemented some day. Just not any day soon. There are probably a lot of things in that category.

Fish-Git commented 4 years ago

Bob Polmanter wrote:

Another part of the speed loss is due to the continual adding of new features that must be supported. Every time some new thing, the “Blah Blah Blah Assist Feature” is added to z/Series and then it ends up in Hyperion there is a cost as a new check is added somewhere to see if the feature is active or not.

Well... possibly for a few edge cases maybe, but for the most part that is not true. Only if the new feature or facility is controlled (enabled) by a bit in a control register for example, would we then have to check that bit each time to see if the facility or feature was programmatically enabled or not and then do something different. But, while admitting I do know know for sure, I suspect that such situations are, relatively speaking, fairly rare (and likely limited to instructions or code that isn't executed a zillion billion times a second).

If what you're referring to however is a newly defined facility for example, which defines some new instructions to go along with it, then that is no longer true with our new facility design (FT2 table in facility.c). Before the new facility design, yes, each of the new instructions associated with the new facility would have to have a FACILITY_CHECK macro to check up front whether or not the facility was enabled or not and program check if it wasn't (since the instruction isn't supposed to even exist if the facility in question doesn't exist), but all that went away with the new design.

Now, when a new facility is released that comes with it's own set of instructions, the code in facility.c will automatically disable those instructions at startup if the facility isn't enabled by default (by patching the opcode (instruction) table to route that instruction to an instruction function that immediately program checks), and automatically enables them ("undoes" its previous disablement) whenever the facility is manually enabled.

This relieves the Hercules developer from having to manually insert FACILITY_CHECK macros in each of the new instruction functions that the facility introduces, and thus makes the instruction run faster since it now doesn't need to bother to manually check whether the facility is enabled or not each time it's executed. (*)

So yes, you're right for certain edge cases, but for most cases (i.e. the new facility instruction case, which I'm guessing is most cases) you're wrong. There should be zero additional overhead for those cases.

(*) Yeah, yeah, as Ivan has already pointed out the actual host processor overhead of a simple TEST/BRANCH is in most cases nil (0 or 1 cycle, which is virtually none at all on the grand scheme of things), but it is extra overhead nonetheless. I mean, why burn an extra CPU cycle if you don't have to, right?

Fish-Git commented 4 years ago

SILLY QUESTION:

Which is the "more correct" abbreviation: TEF or TXF?

(Or maybe I should be asking, which abbreviation do each of you prefer? I myself obviously prefer TXF, but I've noticed others using TEF, which is why I'm asking.)

rwoodpd commented 4 years ago

UPDATE: z/OS 2.4 now comes up correctly under z/VM 6.4.

I had to make a one line change in sie.c to get it to work: a call to alloc_txfmap was needed for GUESTREGS right after the cpu_init call.

I will push the change.

Fish-Git commented 4 years ago

UPDATE: z/OS 2.4 now comes up correctly under z/VM 6.4.

Fantastic! Great progress!

I'm going to start work on a preliminary README.TXF document and begin adding some PTT trace statements and possibly a new t+txf command.

Peter-J-Jansen commented 4 years ago

I too have been following this thread with great interest, and must congratulate you all on this extraordinary success story! I'd love to try it out too, but unfortunately I don't have access to TXF capable OS's.

Cheers,

Peter

Fish-Git commented 4 years ago

Peter Jansen wrote:

I'd love to try it out too, but unfortunately I don't have access to TXF capable OS's.

Understood. But you do realize you can still of course "try it out" anyway, even with a non-TXF capable OS, right? You just won't be able to test/exercise our actual TXF logic, that's all. But you'll at least be able to verify for us that our new TXF-capable version of Hercules doesn't accidentally cause any problems with your non-TXF aware OS, right? (other than maybe performance)

That is to say, you won't be able to directly test the new TXF code, but you'll at least be able to verify we haven't somehow accidentally broken your non-TXF capable OS, right? You can still at least still do that for us, right?

For example, in addition to my crippled 2.4 loaner, I myself have also been testing with 1.10, 2.1 and 2.3 (as well as z/VM 6.3 and even DOS/VS too!) (and as far as I know none of them are TXF capable), and they all seem to run just fine (other than the larger than expected performance hit with the 'z' operating systems, which I wish we could do a better job at reducing).

Why don't you give it a try anyway and let us know how it goes! Let us know if you have any unexpected/unexplained problems as well as your personal thoughts on the performance. Thanks!

Fish-Git commented 4 years ago

Need some help folks!

https://github.com/Fish-Git/hyperion-fish-txf/issues

SDL-Hercules-390 / hyperion

Transactional-Execution Facility design / implementation #263