SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

ECKD DASD - Locate Record Write Track Operation (x'0B') Not Working Properly #601

Closed arfineman closed 8 months ago

arfineman commented 11 months ago

According to SC26-7298-01 page 39:

Locate Record must be followed by one Write Data CCW, and a number of Write CKD CCWs that is one less than the value in the Count parameter with the following exception:

  1. An Erase may be substituted for the last or only Write CKD command in the domain.
  2. If the Count parameter contains a value of 1, no Write CKD CCWs can follow the update data CCW. In this case the remainder of the track will be erased after the R0 data area has been updated.

Under Hercules exception 2 is not honored and track is not erased after R0 data is updated. Here is a sample channel program that fails (Completes with no errors, but does not erase the track as it should):

DEFXT    CCW1  X'63',DEFXTP,x'40',16                      
LOCRD    CCW1  X'47',LOCRDP,x'40',16                      
WRTDATA  CCW1  X'05',R0DATA,x'00',8                     

DEFXTP   DC    X'00C00000 00000000 00000000 00000000'  
LOCRDP   DC    X'0B000001 00000000 00000000 00000000'  
R0DATA   DC    XL8'00'                                                   

A simple test is to create a 1 cylinder disk with dasdinit -lfs T:/VOL123 3390 VOL123 1 and execute the above CCW chain.

The track 0 is not being erased.

I see in ckddasd.c routine ckd_write_ckd after every Write CKD an end-of-track marker is added. I think ckd_write_data routine should be updated to check for Write Data of R0 and the operation of Write Track('0B') and a count of 1. If so, an end-of-track marker should be added.

Best regards,

arfineman commented 11 months ago

I should have added the following paragraph on page 38 as well. It makes it clear that a count of 1 must clear the track:

Write Track (0B): This Operation code prepares the storage director to update write the data area of Record Zero and format the remainder of the track. The number of records to be formatted is one less than the value specified by the Count parameter. If the value in the Count parameter = 1, the remainder of the track is erased after the Record Zero data area has been updated.

arfineman commented 9 months ago

I was able to fix this with the update below to ckddasd.c for case: 0x05 (Write Data).

Best regards,

    case 0x05:
    /*---------------------------------------------------------------*/
    /* WRITE DATA                                                    */
    /*---------------------------------------------------------------*/

..
..
..

        /* Write data */
        rc = ckd_write_data (dev, iobuf, num, unitstat);
        if (rc < 0) break;

        /*****Start of added code **************************************/ 

        /* If lrcount=1 & r0 then erase rest of the track */

        if (1
            && (dev->ckdloper & CKDOPER_CODE) == CKDOPER_WRTTRK
            && dev->ckdcurrec == 0
            && dev->ckdlcount == 1
        )
        {            
            /* Write end of track marker */
            rc = ckd_erase (dev, iobuf, count, (int*)&size, unitstat);
            if (rc < 0) break;

            /* Return normal status */
            *unitstat = CSW_CE | CSW_DE;
            break;
        }

        /******End of added code **********************************/ 

        /* If track overflow, keep writing */
Peter-J-Jansen commented 9 months ago

Hi Aaron,

So this correction, together with the one for Issue #603, looks like this, right?

--- SDL-hyperion_4f007805_/ckddasd.c    2023-11-16 15:16:55.497072079 +0100
+++ SDL-hyperion_issue_601_and_603_fix/ckddasd.c      2023-11-19 14:10:22.622284514 +0100
@@ -3322,6 +3322,7 @@
             {
                 DefineExtent( dev, code, flags, chained, count, prevcode,
                               ccwseq, iobuf, more, unitstat, residual );
+                prevcode = 0x63;                              
             }
             else
             {
@@ -3339,10 +3340,12 @@
                               ccwseq, iobuf, more, unitstat, residual );
                 if (*unitstat != (CSW_CE | CSW_DE))
                     break; // (error!)
+                prevcode = 0x63;                    
             }

             LocateRecordExtended( dev, code, flags, chained, count, prevcode,
                                   ccwseq, iobuf, more, unitstat, residual );
+            prevcode = 0x4B;                                  
             break; // Done!

         case PFX_F_DE_PSF:
@@ -4045,6 +4048,24 @@
         rc = ckd_write_data (dev, iobuf, num, unitstat);
         if (rc < 0) break;

+
+        /* If lrcount=1 & r0 then erase rest of the track */
+
+        if (1
+            && (dev->ckdloper & CKDOPER_CODE) == CKDOPER_WRTTRK
+            && dev->ckdcurrec == 0
+            && dev->ckdlcount == 1
+        )
+        {            
+            /* Write end of track marker */
+            rc = ckd_erase (dev, iobuf, count, (int*)&size, unitstat);
+            if (rc < 0) break;
+
+            /* Return normal status */
+            *unitstat = CSW_CE | CSW_DE;
+            break;
+        }
+
         /* If track overflow, keep writing */
         offset = 0;
         while (dev->ckdtrkof)

I've successfully tested this, and, whilst it does not fix my issue #608 (as expected), it does no harm either. (I'll work on the I/O traces for issue #608 as soon as I can.)

Cheers,

Peter

wrljet commented 9 months ago

One comment on this... We're not going to check in code with:

prevcode = 0x63;

I have no idea what 0x63 means here. That'll want to be a constant equate someplace, and be commented as to what's actually going on where it's used.

Bill

arfineman commented 9 months ago

Hi Bill, 0x63 is the CCW code for Define Extent. There does not appear to be constants assigned to CCW codes, as they are referenced as constants through the code. I just posted this for reference. It's up to the development team to decide if they incorporate this or not and how they do it. Best regards,

 case 0x63:
    /*---------------------------------------------------------------*/
    /* DEFINE EXTENT                                                 */
    /*---------------------------------------------------------------*/

...
...

 if (0
            || chained == 0
            || (1
                && prevcode != 0x47 // LOCATE RECORD
                && prevcode != 0x4B // LOCATE RECORD EXTENDED
                && prevcode != 0xDE // READ TRACK
                && prevcode != 0xE7 // PREFIX
               )
arfineman commented 9 months ago

Hi Peter, Yes, the latest version is correct. I can provide a test case for both operations that will show a failure before the change and success afterwords. In regard to your situation, problem #608 , 'File Protect Error' means the channel program is violating the File Mask by going beyond the extents previously set. Generally, operating systems don't make this kind of mistake. Once I see the entire channel program I may be able to provide more insight. Best regards,

wrljet commented 9 months ago

Aaron, my apologies for jumping off the handle.

We do need to add the comments though, because very few people will recognize the opcodes.

As far as "the development team" goes, there isn't a lot of activity just now. Fish is on a vacation.

It's up to the rest of us to try to do the right thing, and I want to be as cautious as possible.

Some easy-to-reproduce test cases would be helpful. And practical evidence it doesn't break something else.

I myself never wander far from OS/360/MVS, so I'm no help here.

Bill

arfineman commented 9 months ago

Hi Bill, That is no problem at all. I'm very grateful for the tool you provided that allow me to make correction to my personal copy and generate the binaries for Windows. Something I never thought I would ever be able to do. Best regards,

wrljet commented 9 months ago

Is everything working for you now?

arfineman commented 9 months ago

Yes. This issue is now resolved. Thank you.

Fish-Git commented 8 months ago

Yes. This issue is now resolved. Thank you.

Aaron, I'm not seeing the fix anywhere in the code. That is to say, Peter's proposed fix does not appear to ever having been committed, so from the SDL Hercules development team's point of view, the problem has not been fixed yet.

May I presume that Peter's proposed fix is what needs to be committed?

arfineman commented 8 months ago

Hi Fish, Sorry for the delay. Yes, Peter's proposed updates resolves the issue. Best regards,

Fish-Git commented 8 months ago

Bill wrote:

One comment on this... We're not going to check in code with:

prevcode = 0x63;

I have no idea what 0x63 means here. That'll want to be a constant equate someplace, and be commented as to what's actually going on where it's used.

Aaron wrote:

Hi Bill, 0x63 is the CCW code for Define Extent. There does not appear to be constants assigned to CCW codes ...

I've thought about doing that myself, but I never got around to doing so yet. I think I actually like seeing the specific hex codes themselves used in the code instead. To me, seeing prevcode = 0x63; // DEFINE EXTENT or prevcode = 0x4B; // LOCATE RECORD EXTENDED is more self-documenting and programmer-friendly than, say, using prevcode = DEFEXT; or prevcode = LRE;. I know that sounds weird, but there are a lot of CCW opcodes, and coming up with meaningful/descriptive #define constant names for each of them might prove to be challenging.

And besides, it's a PITA to have to refer back to the header file that contains all of the #defines each time one wants to know what actual hex code is being compared for in the statement you're looking at. It sort of breaks your train of thought when you're trying to follow the logic of the code.

I realize that goes against normal good programming practice, but in this particular case I think as long as we're careful to always have a comment on each such statement that documents what CCW code is being compared (like we're already doing today), we'll be okay. If we ever start adding code that doesn't include such informative comments though, then yeah, that would of course be a problem.

It's a judgement call, really.

If you want to go ahead and do that, Bill (@wrljet) (or anyone else), feel free to do so. I certainly won't complain nor change it back. I'm just sort of in lazy mode right now, and as I said, I think our using comments is OK for now.   :)

Fish-Git commented 8 months ago

Aaron wrote:

Hi Fish, Sorry for the delay. Yes, Peter's proposed updates resolves the issue.

Great! I'll go ahead and commit the fix then. (And I'll be sure to add the appropriate identifying comment to each of the prevcode = ... statements too!)

Fish-Git commented 8 months ago

Aaron? (@arfineman) Quick question before I commit:

Should the check for 0xE7 // PREFIX be included in this fix too, or not? I'm thinking not. Yes? Please confirm!

I apologize, but I've been away for a while and trying to follow (get caught up with) each of the ckddasd.c GitHub Issues seems to be confusing me with respect to what changes fix what problems, without having to wade through many, many postings/comments. I want to make sure I'm not going to screw things up!

(Which is why I'd prefer to have access to your repository/fork so that doesn't happen!)

arfineman commented 8 months ago

Hi Fish, With this fix in, there is no need to check for 0xE7 anymore. Not here, not anywhere. But there is no harm leaving it in. Module ckddasd.c uses the prevcode variable to verify chaining requirement. With the added code, if a valid Define Extent is included in the Prefix CCW, variable prevcode is set to 0x63. If the Prefix is format 1, which must have a valid Locate Record Extended, variable prevcode is set to 0x4B to satisfy any chaining requirement for Locate Record Extended. Best regards,

Fish-Git commented 8 months ago

Reading through all of the comments in this GitHub Issue and looking at Peter's fix more closely (which you said was correct), I now believe Peter's fix is actually wrong/bad.

Near the beginning of this issue (3rd comment), you wrote that you fixed the problem by inserting the necessary code into the 0x05 (Write Data) CCW logic, just after the call to ckd_write_data, (which I also believe is the correct fix too).

HOWEVER...  Peter's proposed fix (patch) is actually erroneously inserting your new code into the 0x85 (Write Update Data) CCW logic, not the expected 0x05 (Write Data) CCW logic!  (Oops!)

This can be confirmed by looking at his actual proposed patch. As you will see, your new code is being inserted at line 4045(!), which is the wrong place!

Line 4045 is in the 0x85 (Write Update Data) CCW logic, not in the 0x05 (Write Data) CCW logic. The actual 0x05 (Write Data) CCW logic where your fix should actually go should actually be at line 3955!

Peter's patch is inserting your new proposed code into the wrong place!  (Oops!)

I will try to conform this by trying to throw together the simple stand-alone test case that you proposed in your issue's opening comments. It might take me a day or three, but I want to be damn sure we're committing the right fix in the right place! Thanks.


p.s. Peter? (@Peter-J-Jansen) I would appreciate it if you would confirm my findings please. Sometimes when you post patches in text form like we usually do, looks can be deceiving. I suspect Aaron saw just the code that was being inserted (and the two 'prevcode' statements) and thought to himself "That looks good to me!", but with him likely not being familiar with the format of patch (diff) files, didn't notice the actual line number of where the code was actually being inserted was wrong!


For Aaron's sake, the lines that start with @ (at sign) tell the patch program which line numbers (and for how many lines) in the old/new code the statements that follow it should be placed. As you can see, in Peter's bad patch, the lines were @@ -4045,6 +4048,24 @@, which is wrong.

Ref: https://www.google.com/search?q=format+of+diff+or+patch+files%3F Ref: https://stackoverflow.com/questions/987372/what-is-the-format-of-a-patch-file/987467#987467

arfineman commented 8 months ago

Hi Fish, If that would help I can send my copy of ckdsasd.c that has been tested. Unfortunately I don't have a git repository and keep things local and use the Hercules_Helper to build. Best regards,

Fish-Git commented 8 months ago

If that would help I can send my copy of ckdsasd.c that has been tested.

Yes please! That would certainly help a lot, for now.

Unfortunately I don't have a git repository and keep things local and use the Hercules_Helper to build.

Understood! But if you're interested, we can certainly help you to do that. Trust me, it's not that difficult or complicated. It would really help both us and yourself a lot in the future too. It really does make managing things much easier, and wouldn't impact your use of Hercules Helper at all. It's truly a win-win.

arfineman commented 8 months ago

Hi Fish, My local copy of ckddasd.c attached below. Best regards,

Fish-Git commented 8 months ago

Hi Fish, My local copy of ckddasd.c attached below. Best regards,

Thanks! Fix committed! (337ad61a6754f8ca51ae48b267a4faed7081b4fc). Closing issue.