brouhaha / tapeutils

GNU General Public License v2.0
22 stars 4 forks source link

The last byte of the output is truncated. #16

Open jfcl opened 2 years ago

jfcl commented 2 years ago

The tool seems to truncate the last byte of the file.

As far as I understand in this mode, the output file should always be a multiple of 5 bytes. My tool which which reads the .SAV file whines that there is missing data.

It looks like the code intentionally does this, but I'm suspicious that even attempting to truncate the file fundamentally broken. It looks to me like the existing code truncates the last byte of the file just because it is zero - not because the file size is wrong.

I don't know what the correct fix is. I don't have a lot of test cases using the other modes and I don't know if this breaks something else.

Previous versions of the code had the following comment:

 /*
   * This code attempts to remove trailing nulls from text data.  Be careful
   * not to remove trailing zeros from binary data.
   */

I added some printf() statements to the code and in my case, the last 5 bytes (last 36-bit word) in the buffer are:

buffer[bufpos-0] = 0x00 buffer[bufpos-1] = 0x00 <- should be last byte in file buffer[bufpos-2] = 0x30 buffer[bufpos-3] = 0x00 buffer[bufpos-4] = 0x00 buffer[bufpos-5] = 0x2b bufpos is 2235

A hex dump of a file created with the Github code and 127674 bytes long looks like:

0001f2a0: 00 90 4f 44 5e 77 27 00 00 00 01 91 03 77 5d 21
0001f2b0: 2b 7f 7b 20 00 30 2b 00 00 30

A hex dump of a working file (truncation code removed) and 127675 byte long looks like:

0001f2a0: 00 90 4f 44 5e 77 27 00 00 00 01 91 03 77 5d 21
0001f2b0: 2b 7f 7b 20 00 30 2b 00 00 30 00 

The command line I'm using is: t10backup -i -x diagnostics..dsrpa.sav -f ks_diag_gs.tap

I modified the code as follows:

void WriteBlock (void)
{
  char buffer [5*512];
  long bufpos, index;

  unpackdata ();

  for (index = headrh [G_LND], bufpos = 0;
       index < (headrh [G_LND] + headrh [G_SIZE]); index++) 
    {
      pars_5chars (index, &buffer [bufpos]);
      bufpos += 5;
    }

-  if (headlh [G_FLAGS] & GF_EOF) 
-   {
-      for (index = 1; index < (eightbit ? 4 : 5); index++) 
-   {
-     if (buffer[bufpos - 1] == (char) 0)
-       bufpos--;
-   }
-    }

  (void) fwrite (buffer, sizeof (char), bufpos, destination);
}
larsbrinkhoff commented 2 years ago

Some quick remarks before I dive into the details:

  1. You are talking about t10backup, correct?
  2. I think for the purpose of extracting text files, leaving out trailing NULs in the last word is a nice feature.
  3. For that to work, input will have to pad out words if EOF is encountered.
jfcl commented 2 years ago

Yes. Sorry. I'm using t10backup.
Am I correct that eight-bit mode is ASCII/Text and interchange-mode is binary data?

Are you suggesting to truncate only in eight-bit mode?

larsbrinkhoff commented 2 years ago

It seems to me eight-bit mode is not for ASCII. Maybe the intent is to extract four 8-bit bytes from a 36-bit word in the regular PDP-10 way. But as far as I can see it's not going to work because the eightbit variable doesn't affect much.

The interchange variable similarly doesn't do much. I would have guessed it's related to BACKUP10 interchange mode format. Does the -i switch work for you? Does it change anything?

My suggestion would be for your tool to accept files that are not a multiple of five octets, and pad out with zeros at the end. That's what I do in my tools.

larsbrinkhoff commented 2 years ago

One more comment. I prefer not to make any distinction between binary data and text. Where ever possible, I use a format that extract five 7-bit ASCII characters the usual way, but also puts bit 35 in the top bit of the last character. This works for both binary and text.

jfcl commented 2 years ago

I'm glad we're having this conversation. Coming from from the hardware world, I had assumed that the "eight-bit" and "interchange" command line switches in t10backup described the tape formats that were supported by the TM02 and TM03 tape formatter. That might be a leap on my part...

My understanding of tape formats are described here: https://github.com/KS10FPGA/KS10FPGA/wiki/Massbus-Tape-Controller-(MT)#tape-formatting

For example, the "five 7-bit ASCII characters the usual way, but also puts bit 35 in the top bit of the last character" describes the PDP-10 "ASCII" or "Interchange" tape format that was supported by the TM02.

While the "ASCII" format removed from the TM03, the PDP-10 or "eight-bit" or "compatible" tape format was added to the TM03. This tape format stored 32-bits of information in 8-bit bytes, was fully compliant with ANSI tape standards, but was not capable of storing PDP-10 36-bit binary data.

I guess SIMH is allowed to create any tape file formats they'd like and aren't constrained by any particular hardware implementation. That does imply that I need to add the SIMH tape formats (if they are different) to my FPGA implementation.

Thus my misunderstanding and comment about using the "eight-bit" format for ASCII data.

larsbrinkhoff commented 2 years ago

I think it's useful to clarify various views of data involved here:

  1. The file as it appears in a PDP-10 system. For the purpose of this discussion we can probably view 1. as just a sequence of 36-bit words.
  2. The tape format as written by a tape controller to a physical tape. 9-track tape formats include core dump, industry compatible, ASCII, and maybe others.
  3. The SIMH tape image format. We can disregard 7-track tape images.
  4. The file as it appears on a Unix/Windows-like host system; a sequece of octets.

I have only ever seen PDP-10 9-track tape image files written in the core dump format. I suppose PDP-10 operating systems and/or applications can use the ASCII format, but I have not seen that so far.

The data format I described above is for storing PDP-10 files in an octet-based host file system. This format is the same as the ASCII physical tape format, but don't confuse the two different usages.