dasdseq enhancement request

SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator

Other

240 stars 90 forks source link

dasdseq enhancement request #614

Closed GregoryTwin closed 7 months ago

GregoryTwin commented 7 months ago

At the moment, dasdseq is able to extract data from a z/OS sequential dataset using either ASCII or binary mode. However, binary mode can only be used for a very limited number and type of datasets, like TSO TRANSMIT results for example, since record boundaries would otherwise be lost if ASCII mode were to be used.

I propose implementing a third dasdseq mode, called e.g. "RECORD" or "BLOCK' mode. This would be the same type of mode that FTP currently uses when 'quote mode b' is specified when uploading or downloading a file, where each data block is simply prefixed with x'80' followed by a 2-byte block length.

Using this new dasdseq extraction mode, the ease and speed at which transferring datasets between Hercules and a real mainframe could be dramatically improved:

On z/OS under Hercules, we could use ADRDSSU to dump any data to a sequential dataset.
With the improved dasdseq, we could then extract that sequential dataset to a host system file.
On the host system, we could then use FTP "quote mode b" (i.e. binary transfer mode) to directly upload that file to the real mainframe.
And then of course on the real mainframe, we would then restore the original dataset via ADRDSSU.

We can of course currently accomplish the thing today, but it unfortunately takes two separate FTPs in order to do so:

On z/OS under Hercules, we use ADRDSSU to dump any data to a sequential dataset.
Using FTP "quote mode b" (binary transfer) on our host system, we then download this sequential dataset from z/OS (running under Hercules) to our host system.
Then, using FTP "quote mode b" (binary transfer) on our host system again, we then upload this file to the real mainframe.
And finally, on the real mainframe we then restore the original dataset via ADRDSSU.

Given how slow FTP is compared to dasdseq (especially for large datasets), this new dasdseq extraction mode would thus allow datasets to be uploaded or downloaded to/from real mainframes much more quickly and easily than without it.

Fish-Git commented 7 months ago

Hi Gregory!

That sounds like a really good idea! I will try to get started on making that change for you right away! Thank you for suggesting it! :)

(Note: it might take me a little while to complete, as I am currently busy with other things at the same time, but I will let you know when I am done so you can test it.)

Fish-Git commented 7 months ago

Gregory (@GregoryTwin),

SOME QUESTIONS:

I notice that dasdseq currently writes its output one logical record at a time. That is to say, currently it reads one "block" of data from the input dataset, and then writes multiple "lrecl" records to the output file (i.e. fixed-block mode). Am I to presume you are requesting that "record" mode would prefix each logical record with X'80' + XL2'reclen' + record, whereas "block" mode would do the same thing but for the entire block only? Yes?

dasdseq does not currently write any file output when the -ascii option is specified. Instead, it simply displays each translated logical record to the console/terminal (i.e. to stdout) as a regular Hercules message:

dasdseq -ascii "Q:/111111.cckd" LARGE2.DATASET
03:08:15.778 dasdseq 111111.cckd LARGE2.DATASET started; process-id = 1544 (0x00000608)
03:08:15.816 HHC02499I Hercules utility DASDSEQ - Sequential DSN unload - version 4.7.0.11062-SDL-DEV-gfd3a6a77
03:08:15.816 HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others
03:08:15.816 HHC01417I ** The SDL 4.x Hyperion version of Hercules **
03:08:15.816 HHC01415I Build date: Feb  8 2024 at 03:06:54
03:08:15.841 HHC00476I 0:0000 CCKD64 file 'Q:/111111.cckd': opened r/o
03:08:15.841 HHC00470I 0:0000 CCKD64 file 'Q:/111111.cckd': model 3390-27 cyls 32760 heads 15 tracks 491400 trklen 56832
03:08:15.932 HHC02691I ascii> '000000000000001'
03:08:15.932 HHC02691I ascii> '000000000000002'
03:08:15.932 HHC02691I ascii> '000000000000003'
03:08:15.932 HHC02691I ascii> '000000000000004'

Should the new "record" mode also apply (be valid) when the -ascii option is specified, and cause dasdseq to write its translated records to the output file instead? (i.e. not to the console/terminal?) Yes? Or should the -ascii, -block and -record options all be considered to be mutually exclusive to one another?

Thanks.

GregoryTwin commented 7 months ago

Thank you very much for your attention to the request! Please see my thoughts below:

I suppose, dasdseq "-block" mode should write data exactly the same way as "ftp get" does when "quote mode block" is in effect. For a member of partitioned dataset with RECFM=FB LRECL=80 BLKSIZE=27920, "ftp get" writes logical records prefixed with X'800050'. z/OS ftp does not have "quote mode record", there is just "quote mode block". But IBM documentation sometimes call this "record mode" as well as "block mode", probably because it functionality.
I guess, write translated records (-ascii) could be subject of another enhancement.
"should the new "record" mode also apply (be valid) when the -ascii option is specified" Definitely NO! -ascii and -block should be mutually exclusive. -block assumes binary mode.

Let me try to explain once more. RECFM=F and RECFM=V datasets could be used when transferred in binary mode, because in both cases we are able split the data stream into records to restore the original record (when F, just split by LRECL, when V, split using record length kept in the record descriptor BDW & RDW).

So, -block mode is not really necessary for RECFM F or V dataset. However, without -block mode, RECFM=U datasets cannot be processed at all, because block boundary are lost and there is no way to split the data stream correctly. This is especially useful for ADRDSSU DUMP dataset (RECFM=U).

GregoryTwin commented 7 months ago

(Oops!) I'm completely wrong with point 3 above! Sorry for that!

I've just tried ftp:

ftp mf
logon
password
ascii
quote mode b
get aaa

It works for ascii mode too! I got translated records prefixed with x'800050'. So, probably would be better to allow -block mode with -ascii, to provide the same beheaviour as ftp.

GregoryTwin commented 7 months ago

I've also noticed that the last block of data transferred in "block mode" is prefixed with x'40....' rather than x'80....'.

So I had a look at section 3.4.2 on page 21-22 of RFC959 (File Transfer Protocol):

      3.4.2.  BLOCK MODE

         The file is transmitted as a series of data blocks preceded by
         one or more header bytes.  The header bytes contain a count
         field, and descriptor code.  The count field indicates the
         total length of the data block in bytes, thus marking the
         beginning of the next data block (there are no filler bits).
         The descriptor code defines:  last block in the file (EOF) last
         block in the record (EOR), restart marker (see the Section on
         Error Recovery and Restart) or suspect data (i.e., the data
         being transferred is suspected of errors and is not reliable).
         This last code is NOT intended for error control within FTP.
         It is motivated by the desire of sites exchanging certain types
         of data (e.g., seismic or weather data) to send and receive all
         the data despite local errors (such as "magnetic tape read
         errors"), but to indicate in the transmission that certain
         portions are suspect).  Record structures are allowed in this
         mode, and any representation type may be used.

         The header consists of the three bytes.  Of the 24 bits of
         header information, the 16 low order bits shall represent byte
         count, and the 8 high order bits shall represent descriptor
         codes as shown below.

         Block Header

            +----------------+----------------+----------------+
            | Descriptor     |    Byte Count                   |
            |         8 bits |                      16 bits    |
            +----------------+----------------+----------------+

         The descriptor codes are indicated by bit flags in the
         descriptor byte.  Four codes have been assigned, where each
         code number is the decimal value of the corresponding bit in
         the byte.

            Code     Meaning

             128     End of data block is EOR
              64     End of data block is EOF
              32     Suspected errors in data block
              16     Data block is a restart marker

         With this encoding, more than one descriptor coded condition
         may exist for a particular block.  As many bits as necessary
         may be flagged.

         The restart marker is embedded in the data stream as an
         integral number of 8-bit bytes representing printable
         characters in the language being used over the control
         connection (e.g., default--NVT-ASCII).   (Space, in the
         appropriate language) must not be used WITHIN a restart marker.

         For example, to transmit a six-character marker, the following
         would be sent:

            +--------+--------+--------+
            |Descrptr|  Byte count     |
            |code= 16|             = 6 |
            +--------+--------+--------+

            +--------+--------+--------+
            | Marker | Marker | Marker |
            | 8 bits | 8 bits | 8 bits |
            +--------+--------+--------+

            +--------+--------+--------+
            | Marker | Marker | Marker |
            | 8 bits | 8 bits | 8 bits |
            +--------+--------+--------+

I suppose, descriptors 32 and 16 (x'20' and x'10') not applicable in our particular case.

Fish-Git commented 7 months ago

I suppose, dasdseq "block" mode should write data exactly the same way as "ftp get" does when "quote mode block" is in effect. For a member of partitioned data set with RECFM=FB LRECL=80 BLKSIZE=27920 ftp get writes logical records prefixed with X'800050'. z/OS ftp does not have "quote mode record", there is just "quote mode block", but IBM documentation sometimes call this "record mode" as well as "block mode", probably because it functionality.

Thank you for the clarification!

(Oops!) I'm completely wrong with point 3 above!

It works for ascii mode too! I got translated records prefixed with x'800050'. So, probably would be better to allow -block mode with -ascii, to provide the same beheaviour as ftp.

Nope. We're both wrong.

After reviewing (looking closely at) dasdseq's code (logic), it appears the "-ascii" option does not write any data whatsoever to the output file! Rather, it simply displays the records on the Hercules console! (as a Hercules message). The output file is never written to.

No I suppose we could change that behavior, but I would rather not. It appears the original author purposely designed dasdseq that way. They just didn't document it that well.

Fish-Git commented 7 months ago

I've also noticed that the last block of data transferred in "block mode" is prefixed with x'40....' rather than x'80....'.

So I had a look at section 3.4.2 on page 21-22 of RFC959 (File Transfer Protocol):

Thank you! But I am still confused about the X'40' EOF flag. :-(

The way dasdseq works is, it reads the dataset one block at a time, and then processes each block one record at a time, until there are no more blocks remaining.

Should the EOF flag be used on the last record of each block? Or only on the last record of the very last block?

I presume it should be the latter? Yes? Only when EOF is reached on the dataset, yes? (i.e. only at the end of the very last block of the dataset? Yes?)

The reason I'm asking is because, by using it at the end of every block, it allows to determine the end of each block (i.e. the blocking factor). That is to say, if the file is RECFM=FB LRECL=80 BLKSIZE=27920 and contains, say 3500 records, that would consist of 10 blocks of 27920 bytes (349 records in each block), with the last block being only 800 bytes (final 10 records).

If the EOF flag was used on the last record of each block (i.e. every 349 records), then it would be easy to determine the size of each block. (The EOF flag would mark the end of each block).

It seems unnecessary to set the EOF flag only when EOF is reached on the dataset! We know EOF has been reached because there is no more data in the file! Duh!

So I would appreciate clarification on this. Thanks.

GregoryTwin commented 7 months ago

According to RFC959, only the very last block should have the X'40' EOF flag. All other blocks should have just the X'80' EOR flag. To confirm this, I downloaded a LRECL=80 BLKSIZE=27920 dataset to my PC using "quote mode b", and as expected, only very last record was marked with x'40'.

Being interested in how important this EOF flag might be, I then updated the last record of the downloaded file, replacing the x'40' with x'80' instead, and then tried uploading it back to the MF using "quote mode b". The upload was successful.

So, I believe, this EOF flag is just an informal requirement of RFC959. FTP apparently processes a file normally having x'80' anywhere in its data, including in the very last record.

Fish-Git commented 7 months ago

So, I believe, this EOF flag is just an informal requirement of RFC959. FTP apparently processes a file normally having x'80' anywhere in its data, including in the very last record.

Good! Because supporting it (the X'40' EOF flag) would complicate the existing logic in dasdseq. Supporting just the X'80' EOR flag on the other hand should be easy-peasy.

I will make the change right away.

Thanks for all your help, Gregory!

Fish-Git commented 7 months ago

Implemented in commit 179ba63797f9e1cfd820e04f2eee4e9f1ed96e68.

Feel free to give it a try!

Closing.

GregoryTwin commented 7 months ago

Thank you very much! Also thank you for correcting my English, this is useful for me.

Fish-Git commented 7 months ago

Also thank you for correcting my English, this is useful for me.

You're very welcome! Please keep in mind however, that my own English is not perfect! :)