Discussion: Flaw in MVS Assists

wably commented 3 years ago

Hey Fish, (@Fish-Git )

I’m inviting you to weigh in on a discussion about a problem in the MVS assist code in assist.c. First off, I’m not asking you fix anything or change anything. I already know what’s wrong and what the fix is and I can take care of it. I’m looking for an opinion about the solution.

Now a brief background. I’ve known for months that MVS/370 does not run well with 2 CPUs; it encounters all sorts of abends and system failures during start-up: the entire gamut from master scheduler abends, to communication task abends, to JES2 failures, and more. It runs perfectly on one CPU. During all of this time, I dismissed this as an MVS software bug as everything else works great with multiple CPUs: VM systems, and even the older MVS 3.8 TK4.

Very recently, it came to my attention from another user that is having problems with OS/390 1.2, and reporting to me very nearly identical symptoms, and yet perfect operation on one CPU. This caused a flurry of activity on my part to try to narrow it down – I’ll spare you the details and the analysis and elimination of things to try to locate the cause. Suffice to say, I found out that it was the MVS assists, the E5xx instruction emulation in Hercules. I already had trouble with these assists before (found and fixed last October).

There are four assist instructions that handle locking in MVS; two to obtain a lock and two to release a lock. It turns out that by altering the two obtain lock instructions to force a failure (that is, to force that the lock could not be obtained), it causes MVS to handle the situation himself and instead he obtains the lock via his own code and not via the assist. (There is precedent for this. In the VM ECPS assists if the assist can’t do what is required, the assist hands it back to CP and basically says, ‘here you do it’).

To summarize so far, changing the two obtain lock instructions E504 and E506 to return back to MVS with the indication that the lock could not be obtained causes MVS/370 and OS/390 1.2 to work perfectly with 2 CPUs. The two release lock assists are working fine and require no alterations.

In each of the two instructions, there is a single test to see if the lock is available. I altered the code

if (lock == 0 ….

to this

if (0)

in order to force the ‘else’ path: the lock could not be obtained.

At this point you are probably wondering why I just don’t fix it and be done with it. The problem is this: the code that would ordinarily obtain the lock isn’t working as demonstrated by these two operating systems. But the code looks correct. The document IBM Assists for MVS (GA22-7079-1) details exactly what the assist must do, and Jan’s code in assist.c looks like it is doing exactly what it is supposed to. I stepped this code; I cannot find a flaw in it. Yet two kinds of MVS won’t run properly with this code. (Note that MVS 3.8 TK4 did not use the assist so that’s why it runs ok with 2 CPUs, and OS/390 2.x and above does not use them either, and they run ok too. MVS/XA and MVS/ESA probably use this assist but I know of no one with these systems).

Here's what I would like to do and this is the part where I am soliciting your opinion. I would like to alter these two instructions to force the could-not-obtain path, and disable or comment out the part of Jan’s code that would obtain the lock. And surround this with a large and detailed comment block with my findings, in case five years from now someone shows up with a problem and hopefully we don’t have to reinvent the wheel. Jan’s code is worth keeping for reference, despite the unseen flaw, in other words. I can only assume that the assist specification may have changed after the 1981 document cited above; that was already the 2nd revision as it is. An unknown revision could explain the failures by MVS but until such a document turns up or someone shows up with MVS/ESA say and with it new information, the best course seems to be to disable the apparently flawed code and let the operating system do the function itself.

I’m asking for you to weigh in because you have shown a dislike for old code commented out or disabled. I get it, in part. It clutters things and may add to confusion depending. But sometimes certain code is extremely valuable as a reference. I just want to know that if I were to do this, that you won’t come along afterward and remove the disabled code or other alterations. Alternately, the disabled code could be removed out of the code path and relocated to the bottom of the source to reduce clutter, with a tiny comment indicating it was moved and to go to the bottom to find more information.

Regards, Bob

Fish-Git commented 3 years ago

Hi Bob!

First, your specific question: No, I promise to NOT to remove your commented out code -- if you should ultimately decide to end up doing what you propose to do: force the instruction to always take the "Sorry, I can't do it, please do it yourself" failure path.

HOWEVER...

Your claim that:

But the code looks correct. The document IBM Assists for MVS (GA22-7079-1) details exactly what the assist must do, and Jan’s code in assist.c looks like it is doing exactly what it is supposed to.

is something I must respectfully disagree with.

First, according to ALL of the manuals that describe the System/370 Extended Facility (GA22-7072-0, GA22-7072-1 (replaces -0) and GA22-7079-1), the description of e.g. each of the lock functions all contain verbiage along the lines of:

If the local lock in the ASCB addressed by the first-operand word is not held, the lock is replaced, by using an interlocked update, ...

but I am not seeing that anywhere in the code. What I am seeing is a simple vfetch4 followed later by a vstore4, which, to me, is not an interlocked update. If an interlocked update is required as part of the specification, then I would personally expect to see some type of "compare-and-exchange" type statement somewhere in the code (e.g. cmpxchg4). Yes? Or am I crazy?

Second, page 1 of manual GA22-7072-1 "IBM System/370 Extended Facility and ECPS:MVS" states:

The System/370 extended facility is available on all processor complexes and, as the extended feature, on some models of System/370. The extended-control-program support for MVS (ECPS:MVS) is installed on the IBM 4341 Processor.

leading me to believe that the facility was not available on all 370 models, only very specific ones, and we do not have any type of build constant #defined for this facility or any check anywhere whether the defined CPUMODEL is for one that supports this particular facility (which was apparently an enhancement that came much later on the 370 model line):

"IBM System/370: Subsequent enhancements"

Operating system specific assist, Extended Control Program Support (ECPS). extended facility and extension features for OS/VS1, MVS and VM. Exploiting levels of these operating systems, e.g., MVS/System Extensions (MVS/SE), reduce path length for some frequent functions.

"IBM System/370 Model 168: System/370 Extended Facility":

This optional facility of the 168-3 provides support for MVS/System Extensions (MVS/SE) and for the later MVS/System Product (MVS/SP).

I'm therefore wondering whether your tests would behave differently depending on what CPUMODEL you specify in your Hercules configuration file? Maybe if MVS knew it was running on a model that didn't support this particular assist, then maybe it wouldn't try executing them and would instead simply do the same thing itself?

Finally, the fact that your GUEST_CHECK() macro is checking the ECPSVM_CR6_VIRTPROB and ECPSVM_CR6_VMMVSAS bits of Control Register 6, and those bits are #defined in the "ecpsvm.h" header, leads me to believe these instructions were really only meant for execution under VM and not natively (which makes sense since they're all Control instructions requiring emulated supervisor state after all).

Yet, I am not seeing any type of SIE_MODE test anywhere in any of the instructions like I'm seeing in all of the other "VM Assist" type instructions defined in source file ecpsvm.c. YOU of course, being the VM Assist expert, would certainly know better than I would regarding this issue, but to me, it just doesn't seem right. Shouldn't there be a check for SIE_MODE somewhere in each of these instructions?

BOTTOM LINE: Given my various points listed above, I'm not convinced that the code is actually correct! I personally feel more research and experimentation is in order.

HOWEVER, if you want to commit your proposed "quick fix/workaround" in the mean time, I certainly wouldn't complain. Just document it like you suggested, that's all. Personally, I would #define a new OPTION_..... build constant in the "Research/Workaround build options" section of featall.h, which was created specifically for things such as this:

https://github.com/SDL-Hercules-390/hyperion/blob/2ca4bb57db155573962529b44ccd60fbf58fbf68/featall.h#L26-L51

wably commented 3 years ago

Hi Fish,

What I am seeing is a simple vfetch4 followed later by a vstore4, which, to me, is not an interlocked update. If an interlocked update is required as part of the specification, then I would personally expect to see some type of "compare-and-exchange" type statement somewhere in the code (e.g. cmpxchg4). Yes? Or am I crazy?

My understanding is that the OBTAIN_MAINLOCK(regs) statement at the start of each of these two instructions covers that need. If my understanding is incorrect and we need something else more in line with what you were suggesting, do tell.

I'm therefore wondering whether your tests would behave differently depending on what CPUMODEL you specify in your Hercules configuration file? Maybe if MVS knew it was running on a model that didn't support this particular assist, then maybe it wouldn't try executing them and would instead simply do the same thing itself?

Believe me, I've already tried a good number of CPUMODEL specifications. But none of that really matters because this version of MVS requires the assist be present, period. If you do not have the assist on your hardware, you will received a disabled wait PSW 34 immediately after IPL. In the system wait codes documentation for this release (GC38-1008-9), wait code 34 says:

An operating system that requires one or more of the following features has been loaded on a processor or version of microcode that does not include the feature(s).

System/370 processor, or 4300 processor with System/370 IML

System/370 Extended Facility

System/370 Extended Feature

I already know from prior experience that if the assist is unavailable, MVS will not IPL. To the point: it is not optional and so the CPUMODEL isn't the driving factor. It runs just fine whether the CPUMODEL is 3158, 4381, 9000, and more. If you take away the assist, it wont IPL at all.

Finally, the fact that your GUEST_CHECK() macro is checking the ECPSVM_CR6_VIRTPROB and ECPSVM_CR6_VMMVSAS bits of Control Register 6, and those bits are #defined in the "ecpsvm.h" header, leads me to believe these instructions were really only meant for execution under VM and not natively (which makes sense since they're all Control instructions requiring emulated supervisor state after all).

Those validations of ECPS CR6 bit settings by GUEST_CHECK() are simply to allow the MVS E5xx assist instructions to be executed in a S/370 virtual machine. The E5xx assists are all privileged instructions and return an operation exception in a virtual machine, because the real PSW is in problem state when a virtual machine user is running. VM does not provide simulation of E5xx and they are reflected as a PIC 01. ECPS:VM allows ECPS:MVS to coexist so that these E5xx assists can execute in real problem state (virtual supervisor state). GUEST_CHECK() simply ensures that the proper conditions are met to allow problem state execution of a privileged instruction, which is what all of ECPS:VM's VMA Assist does with LPSW, SSM, LCTL, and so on. In short, this has nothing to do with the problem at hand. And by the way, if you turn off the extended feature in a virtual machine (SET 370E OFF) causing ECPS:VM to disallow execution of E5xx in the real problem state, then MVS wont IPL, again, wait state 34. Before I added GUEST_CHECK() virtual machine execution of MVS/370 was impossible.

Yet, I am not seeing any type of SIE_MODE test anywhere in any of the instructions like I'm seeing in all of the other "VM Assist" type instructions defined in source file ecpsvm.c. YOU of course, being the VM Assist expert, would certainly know better than I would regarding this issue, but to me, it just doesn't seem right. Shouldn't there be a check for SIE_MODE somewhere in each of these instructions?

There does not need to be any checks for SIE_MODE in the E5xx assists for the same reason that everyday instructions like LR, AP, ICM and so on do not need SIE_MODE tests. They don't do anything that requires special handling by SIE or the underlying VM host. Under SIE (which in this context would only be for running MVS/370 in a virtual machine under VM/ESA), the machine is executing virtual machine privileged instructions natively as provided by the current SIE settings. SIE provides no intercepts for E5xx instructions and so while the virtual machine is in supervisor state under SIE it will execute them as if it were real hardware. It would be no different for say, an SSM instruction while under SIE (if the SIE intercept for SSM was turned off).

Within the ECPS:VM support provided by ecpsvm.c, the SIE checks are necessary because some of the ECPS assisted instructions, such as LPSW, SSM, etc., are SIE intercept-able, and so that intercept needs to be honored. Again there are no intercepts for E5xx defined to SIE and further, the actions of E5xx are MVS specific and do not require invention by the underlying VM host.

BOTTOM LINE: Given my various points listed above, I'm not convinced that the code is actually correct! I personally feel more research and experimentation is in order.

Well, I am not in a hurry here. I would be thrilled if an actual problem could be found and fixed rather than avoid the whole issue by having the assist just give it back to MVS. But I feel a bit like I have come to the end of the road. I think the existing code is operating correctly per the known specifications (the interlocked update point not withstanding) and I do not desire ripping out code that looks to be right. But I can only help but think there has been a revision in the specification in the years after 1981; that is really the only explanation. But no other documents have ever turned up. Thus, there were either no more revisions or IBM may have chosen to cease publication of those specifications. Just like they did for SIE. There are other assists too (such as the DAS assist) whose specifications that to my knowledge have never been published; it is simply known as a feature code on some processor.

That is why that I felt that disabling the two assist instructions but not removing them physically was the best option. It leaves the disabled (or commented out) code present for future re-consideration should new information suddenly turn up, whether that is in the form of a newly found document or someone with another operating system that can shed new light. In other words, there is uncertainty here! I am looking for a way to proceed for now without compromising the future if it should be proven we need to put the code back.

Thanks also for the tip about the Research/Workaround Build Options. This sounds ideal and I was not aware of it.

Regards, Bob

Peter-J-Jansen commented 3 years ago

Hi Bob,

I think I will need to apologize here, as Fish is right, and I may have caused the problem, quite some time ago though.

In case cmpxchg assists are available, OBTAIN_MAINLOCK(regs) -- and RELEASE_MAINLOCK(regs) as well -- will expand to nothing, thus becoming no-ops. And that is most likely the case. At that time I was correcting other locking problems, creating cmpxchg assists also for ARM processors, and believed I could ignore the four E5xx instructions in assist.c, as I thought they weren't used anymore. Mea culpa.

The correction is also as Fish already stated, using cmpxch4() if the compare-and-exchange operation needed is on fullwords. I could help you with that, as I am quite involved right now with cmpxchg4/8() working on an attempt at an alternative TXF implementation. I also remember that at the time I was not sure how the E5xx assists worked, as they're not in the current PoP.

I have a scanned PDF version of GA22-7079-01 IBM System 370 Assists for MVS, would that contain the best description? I must admit it is not very meaningful to me, the actual Hercules assist.c code is not very helpful either. I can't really figure out for sure what needs to be compared-and-exchanged atomically in any of the E5xx's. Once I'd know that, I may be able to show the cmpxhg4() implementation for it. It typically takes a "while( ... cmpxhg4( ...) ... ) { . .. }:" construct, sometimes with code inside {...}, but often not even that.

Please let me know how I could help you to correct my oversights.

Cheers,

Peter

wably commented 3 years ago

Fish,

I sent you a private email to your gmail account.

Bob

wably commented 3 years ago

Hi Peter,

Well, this is interesting. If OBTAIN_MAINLOCK is not actually locking then this could very well be the problem. I am not familiar with your cmpxchg assists so anything you can provide here will be very helpful, particularly in how to use them.

The scanned GA22-7079-01 document that you have is the correct one. It is a bit difficult to understand what the assist is trying to do in this document; the specific technical detail is quite hard to read and follow. It could have been worded more clearly for sure. But if you go over it enough and reread it and try to follow Jan's code (for say, E504 Obtain_LOCAL_Lock) it does start to make sense. The E506 CMS_Lock is more difficult; stick with LOCAL first. It also helps if you are familiar in general with MVS locks and how they are managed and used via the SETLOCK macro. If you are not, then the road will be more difficult.

Basically what is needed here is something akin to a COMPARE DOUBLE AND SWAP type operation on the word ASCBLOCK and the word containing the "highest-held-lock-indicator", or HHLI). If ASCBLOCK is zero and the LOCAL lock bit (x00000001) in the HHLI is zero, then swap in the processor address (from PSALCPUA) into ASCBLOCK, and set bit 31 in HHLI. All of this needs to be in one operation. Can cmpxchg do that? The two words to be operated on are not consecutive addresses in storage.

Regards, Bob

Peter-J-Jansen commented 3 years ago

Hi Bob,

You confirmed what I already feared after looking at Jan's code in assist.c, in that two fullwords are compared for a specific value, and when matched, both are updated in an atomic fashion. If these two fullwords are not consecutive, or within one quadword, then a simple use of cmpxchg4, 8, or 16, won't do. And I fear after a first glance that for CMS locks there is even more than just two fullwords.

A typical use of any cmpxchg() operation is as follows. Say an U32 fullword FW needs to be exchanged with the U32 value NEW, but only if prior to the exchange it has value OLD. The code would then look like this :

TEST = OLD;
NOT_CHANGED = cmpxchg4( &TEST, NEW, &FW );

If FW == TEST, then cmpxchg4() would perform a FW = NEW assignment and return 0; otherwise it would do a TEST = FW and return a 1, so that NOT_CHANGED becomes TRUE. And all of this in an atomic operation, implemented in hardware (Intel as well as ARM). Note that either FW or TEST are updated, hence that their addresses are to be passed to cmpxchg4(). If we'd know for sure that any other thread would execute similar code and that eventually FW would hold the value OLD, then a efficient spin loop goes like this :

while( cmpxchg4( &TEST, NEW, &FW ) ) { TEST = OLD };

So far for some typical simple cmpxhgXX() uses. I don't think they can directly solve the E5xx assists, but at the same time find a MAINLOCK(regs), even if it still did what it was supposed to do, extremely inefficient: it would need ALL CPU's and theoretically ALL other storage activity (channels) to be frozen during the obtaining and releasing local or CMS locks.

Textbooks talk about locks in a hierarchical fashion to avoid deadlocks. As an example lock A and lock B must always be acquired in a specific order, e.g. first one gets lock A, and only then lock B. Releasing must go in reverse order. All processes requiring A and B follow the same ordering rule, then there will be no deadlock. Perhaps that's the approach we'd need. I.e. first ascertain that ASCBLOCK is zero, cmpxchg4 it with the processor address, and only thereafter ensure the HHLI bit is zero and then cmpxhg4() set it to 1, then this could perhaps work. Unless of course other uses of these two lock operators interfere, in which case it'd be easier perhaps to define a Hercules E504_5 lock to implement the local lock, and E506_7 for the CMS lock. At any rate, the MAINLOCK(regs) approach looks very inefficient to me.

Sorry again for having caused this trouble Bob.

Cheers,

Peter

wably commented 3 years ago

Hi Peter,

Maybe this could still work with just a cmpxchg4. Because the IBM document only says that ASCBLOCK requires an interlocked update. Perhaps I could code up a cmpxchg4 for ASCBLOCK and if successful then set the LOCAL bit to 1 in the HHLI.

Taking it a little further, I think that maybe that is how MVS did it too. Because think about it, back in the S/370 day it only had CS and CDS instructions available for fullword or doubleword locked updates. Since ASCBLOCK and the HHLI word are not consecutive, MVS could not use CS or CDS on both of these in a single operation. So perhaps the key is to do just the interlocked update on ASCBLOCK. And then if that succeeds only then set the LOCAL lock bit. And of course, a similar operation for the CMS lock.

It wont hurt to try. I'll set it up and work with it this afternoon.

Regards, Bob

Peter-J-Jansen commented 3 years ago

Hi Bob, Oops, I forgot but now I remember: there still is a MAINLOCK_UNCONDITIONAL which might solve the problem if it replaces the older _MAINLOCK calls.

Cheers, Peter

wably commented 3 years ago

Hi Peter,

I think what I proposed in my prior post wouldn't actually work. Some other ASCB could have the lock, and so my own ASCB lock would be 0 and the cmpxchg would succeed and only afterward would I find that the LOCAL bit in HHLI is already set.

Instead, I think HHLI is the word that needs the cmpxchg and needs to be done first. Then the ASCBLOCK could be set after. This kind of contradicts the way I am reading GA22-7079-1 though.

Hmmm...

Bob

Fish-Git commented 3 years ago

Bob wrote:

Fish wrote:

What I am seeing is a simple vfetch4 followed later by a vstore4, which, to me, is not an interlocked update. If an interlocked update is required as part of the specification, then I would personally expect to see some type of "compare-and-exchange" type statement somewhere in the code (e.g. cmpxchg4). Yes? Or am I crazy?

My understanding is that the OBTAIN_MAINLOCK(regs) statement at the start of each of these two instructions covers that need.

(Doh!) Believe it or not, I completely missed that! I was so focused on the non-interlocked-updating of ASCB (i.e. the separate vfetch and vstore), my eyes didn't even see the MAINLOCK calls! Sorry about that.

HOWEVER...

Peter wrote:

In case cmpxchg assists are available, OBTAIN_MAINLOCK(regs) -- and RELEASE_MAINLOCK(regs) as well -- will expand to nothing, thus becoming no-ops. And that is most likely the case. At that time I was correcting other locking problems, creating cmpxchg assists also for ARM processors, and believed I could ignore the four E5xx instructions in assist.c, as I thought they weren't used anymore. Mea culpa.

and:

Oops, I forgot but now I remember: there still is a MAINLOCK_UNCONDITIONAL which might solve the problem if it replaces the older _MAINLOCK calls.

Which certainly makes it sound like the proper(?) fix might be as simple as replacing all of your OBTAIN_MAINLOCK and RELEASE_MAINLOCK calls with OBTAIN_MAINLOCK_UNCONDITIONAL and RELEASE_MAINLOCK_UNCONDITIONAL calls instead?

<shrug>

Worth a try anyway IMO.

wably commented 3 years ago

Hi Fish, Peter,

The UNCONDITIONAL mainlock calls to obtain and release did not resolve the issue. I used UNCONDITIONAL for all four assists that invoke obtain/release mainlock.

I'm wondering how the PLO instruction was implemented in Hercules. This instruction can do locked updates on multiple non-contiguous storage areas. The answer here might be what I need to do for the two words of the MVS lock sequence.

Regards, Bob

wably commented 3 years ago

The PLO instruction uses OBTAIN_MAINLOCK_UNCONDITIONAL. So it should work in assist.c too. Something else isn't right yet.

Bob

wably commented 3 years ago

I double checked my work and I have an OBTAIN_MAINLOCK_UNCONDITIONAL and a RELEASE_MAINLOCK_UNCONDITIONAL pair for each of the four locking instructions. Both MVS/370 and OS/390 v1.2 fail as before with this coding. This makes me question if mainlock UNCONDITIONAL is working correctly, and therefore how safe PLO is? I bring this up only because if PLO is not safe then maybe it would explain some of the weird TXF issues.

Just making an observation; not trying to stir up anything and waste time of others.

Regards, Bob

wably commented 3 years ago

Hi Peter,

This cmpxchg4 sequence appears to work well. I commented out the original test for the E504 LOCAL lock in Jan's code as shown here:

    /* Obtain the local lock if not already held by any CPU */
    if (lock == 0
        && (hlhi_word & PSALCLLI) == 0)

And replaced it with this cmpxchg4 sequence:

        /* Get mainstor address of ASCBLOCK word */
        main2 = MADDRL (ascb_addr + ASCBLOCK, 4, b2, regs, ACCTYPE_WRITE, regs->psw.pkey);

        old = 0;     // ASCBLOCK should be 0
        new = lcpa;  // Replacement value on successful exchange

        /* MAINLOCK may be required if cmpxchg assists unavailable */
        OBTAIN_MAINLOCK( regs );
        {
            /* Attempt to exchange the values; swap success returns 0, swap failed returns 1 */
            if (!cmpxchg4( &old, new, main2 ))
            {

If the swap is successful,

the LOCAL bit is set in the HHLI,
the store of the CPU address (variable lcpa) into ASCBLOCK by the original code is commented out, because the swap already stored it.

If the swap is not successful, the 'else' logic is unchanged from the original and the assist returns for MVS to handle it.

I only coded this for the Obtain Local Lock case. I have not done the Obtain CMS Lock case; that one is still set to 'if (0)' to force the else case. This one is going to be somewhat tougher to work out.

Both MVS/370 and OS/390 v1.2 look good.

I also had to consult MVS documentation about the LOCAL and CMS locks; I was mistaken about what I wrote before that if one ASCB had the lock and another tried to get it. Instead, there is one LOCAL lock per address space. The ASCB lock word isn't to protect from other address spaces; rather, it is to protect from TCBs in the same address space and from system services or user programming running on those TCBs. Thus, it is perfectly allowable to set the bit in HHLI without an interlocked update. ASCBLOCK is the only interlocked word, and that is also the way the Assist documentation reads.

Regards, Bob

Fish-Git commented 3 years ago

This cmpxchg4 sequence appears to work well.

Both MVS/370 and OS/390 v1.2 look good.

Fantastic! Glad to hear you got it figured out.

I'll leave you alone now since it sounds like you have things well in hand.

Thanks for your effort, Bob! Much appreciated buddy, :)

wrljet commented 3 years ago

All you guys are amazing! The knowledge, ability, and sticktoitiveness / Durchhaltevermögen in this crowd is very impressive.

Peter-J-Jansen commented 3 years ago

Hi Bob,

Excellent! I hope the other E5xx assists can be treated in a similar fashion.

I now also recall some of my previous related work. My oldest commit in this area was in March 2016 (commit 8c4be4eeb811d815e2d4f10b7e6298322e3c0c36), where I found the TS instruction on an ARM RaspberryPi looping. It was not yet using a hardware assisted cmpxchg at that time.

The _MAINLOCK_UNCONDITIONAL I created later (commit e0052fd96631dcb1a33439e60a80d5501bca05b0) in October 2019, identical to the original _MAINLOCK.

The _MAINLOCK protected version of the TS instruction protected the TS instruction all right, but not any other instruction that would also modify the same storage, like a simple MVI. That is why I think that _MAINLOCK protections can be problematic. So yes, wondering about the PLO instruction is, I think, very much to the point. Actually, I fear current Hercules PLO implementation may be faulty. Also interesting that you mention TXF, I even suspect PLO is perhaps now no longer used. (Is there an easy way to find out??)

Cheers,

Peter

wably commented 3 years ago

Hi Peter,

PLO has been around for a long time; I'm pretty sure it is being used quite a bit. During my career we even used it in our own company written multi-tasking programming because it is so much more flexible than CS or CDS, primarily because of its ability to lock on words that are in different areas of storage and non-contiguous.

If you want to be sure it is still used in a TXF capable system, I'd suggest using the CP TRACE capability of z/VM. Run your z/OS TXF system in a virtual machine and issue a trace command like this:

#CP CPU ALL CP TRACE INST DATA EE RUN

Your z/OS machine will slow way down but if any PLO instructions are executed (opcode = EE) then you will see them on the virtual machine console.

You could start the trace before you IPL z/OS and let the trace run through the IPL (it will take forever), or you could bring up z/OS normally and start the trace later and then do some activities to see if they use PLO.

When you want to turn off the trace, just use:

#CP CPU ALL TRACE END

Another way to approach this might be to avoid using z/VM and instead temporarily modify Hercules PLO instruction entry code in general2.c and simply write a basic message to the Hercules console each time it is invoked. All you want to know is that it was invoked; you don't need any specific information about the characteristics of the instruction. That would be a heck of a lot faster to obtain your results than using CP TRACE.

Regards, Bob

mcisho commented 3 years ago

I think you would be overwhelmed with output messages in either case! Issue #CP TRACE COUNT before the TRACE INST, and there will be no messages on the virtual machine console, VM simply counts the number of times the event occurred. Issue #CP TRACE COUNT intermittently (the count will be reset to zero each time) to see the number of times PLO was used.

Fish-Git commented 3 years ago

{
    static bool didthis = false;
    if (!didthis)
        LOGMSG("+++ PLO! +++\n");
    didthis = true;
}

Only just a few days ago I used the above technique to determine that z/OS never executes the PFMF instruction.

wably commented 3 years ago

Fish, Peter,

Thanks for your help! Closed by commit 51270dd4e7773cd1067891a68d692d37f078209e

SDL-Hercules-390 / hyperion

Discussion: Flaw in MVS Assists #391