davidgiven / fluxengine

PSOC5 floppy disk imaging interface
MIT License
356 stars 69 forks source link

Missing sector doesn't force a retry. #138

Closed Gandalf-ND closed 4 years ago

Gandalf-ND commented 4 years ago

I've been using FluxEngine to image a number of Norsk Data floppies, both 5.25" DS-HD and 8" SS-DD. (Yeah, I'm a friend of Tingo.) The most common problem I've run into is "Missing sectors". Rerunning the floppy often results in the sector found and in most cases also other sectors missing. In the floppies I copy I have had this issue in about every fourth floppy... and I've done almost 500 floppies so far.

Digging into the code I've traced it down to a failure of detecting the sector header. If the sector header is there but the data record is not detected it leads to a "Sector is there but empty" and a repeat. If the sector header is missing then the program just skips until it detects the next sector header and reads the next sector. If it is detected properly we don't get an error. Now there is a hole in the sector table and FluxEngine fails to see a problem.

I don't know how to best detect a missing sector. Would it be possible to have an option with --ExpectedSectors=8 (for example in my case) and do a test after a track read that we really do have the expected number of tracks, and if we don't have the expected number of sectors then we read the track again up to the number of repetitions needed.

I was looking through the code but it's been a number of years since I wrote any c++ so I don't feel confident enough to suggest a patch, at least not yet. I'm thinking of adding a quick and dirty hack to /lib/reader.cc in readDiskCommand() just to test my theory.

I will put the images and flux data up on a web server soon.

Thanks for a great project! :-)

davidgiven commented 4 years ago

As you say, telling the difference between a missing sector and one which is supposed not to be there is an issue. I don't think there's any solution other than to tell the reader what sectors to expect. It'd be easiest to add this to reader.cc but I think it actually makes more sense in the arch format-specific code, as that's the bit of code which actually knows.

It's a bit nasty; it would require adding a method to AbstractDecoder which would return a set of sector IDs, which readDiskCommand() would query... but it'd be fairly easy to implement.

BTW, how are you getting on with the ND disks? I've recently done a whole bunch of changes and bugfixes which should make reading disks massively more reliable (i.e. fixing the horrible buffer overrun in the firmware). Unfortunately both my 5.25" drives have died and I'm unable to test it on the ND disks which tingox sent me.

Also, it should be possible to write ND disks now, with care.

Gandalf-ND commented 4 years ago

5.25" ND disks works like a charm. I'm running a version about 3 weeks old and haven't updated since then. It works well enough for me. I figured I could always match together blocks from separate runs afterwards and make a complete image from two with dropouts. Every run saves an image, a flux file, a svg map and a text file with the output from fluxengine and ndfs, a ND file system checker. The only thing I need to do manually is to compare missing sectors between first and consecutive reads.

Tingo should be happy to hear that we now can write floppy images too. I'm going to put up the images I'm making right now so he will soon be able to run COBOL on his computers... :-D

The only issue (which I'm trying to track down at the moment) is that I can't seem to read track 1. I'm leaning to a problem with my 8" drive (SS/SD). Trying to read track 0 makes a lot of noises and nothing read. Reading track 1 gives logical track 0. Reading track 2 gives logical track 2... and so on. From a data recovery standpoint it isn't a big issue since it seems like track 1 is the very last track to be used, if it even is used at all.

Right now I'm putting together my third fluxengine with a DS/DD 8" drive to compare results and read a couple double sided disks.

If everything works out, I'm planning to build a mobile system with 10+ drives with a fluxengine + RPi per drive and a central file server for storage. My goal is to have a system which is limited by how fast I can change floppies in the drives and then tackle the give and take 18.000 floppies software archive in Norway. :-D Wall of floppies ... and that's just half of it.

Gandalf-ND commented 4 years ago

... oh, and I had two occasions where fluxengine went amok and deteced hundreds of sectors, resulting in one 39 Tbyte and one 221 Tbyte image files! It's a good thing linux supports sparse files or my disk would have filled up. :-D

I forgot about it but then my backup on my nas froze... until I realized why and excluded those two files. I'm keeping them at the moment just for the lols.

Gandalf-ND commented 4 years ago

I just made a QnD (tm) addition to the /lib/reader.cc file. I created a 8 positions boolean matrix in readDiskCommand, all set to false. Then in the check where you write "Sector OK" I just set the flag for that sector. Then just before you check for bad sectors I do a test to see if all the sectors are ok, if not I just set the hasBadSectors flag to true.

Worked on first try with a stubborn disk.

Before :

writing visualisation
Autodetecting output geometry
H.SS Tracks --->
0. 0 ........................................................................X..X....
0. 1 ................................................X..X.........X.............X....
0. 2 ...........................X.....................................X..............
0. 3 ..............................................X..X.......X..X......X..........X.
0. 4 ....................................................X.............X.............
0. 5 .................................................X.X............................
0. 6 ................................................................................
0. 7 .X.................................................................X............
1. 0 ................................................................................
1. 1 ................................................................................
1. 2 ................................................................................
1. 3 ................................................................................
1. 4 ................................................................................
1. 5 ................................................................................
1. 6 ................................................................................
1. 7 ................................................................................
Good sectors: 1260/1280 (98%)
Missing sectors: 20/1280 (1%)
Bad sectors: 0/1280 (0%)
writing 80 tracks, 2 heads, 8 sectors, 1024 bytes per sector, 1280 kB total

After :

writing visualisation
Autodetecting output geometry
H.SS Tracks --->
0. 0 ................................................................................
0. 1 ................................................................................
0. 2 ................................................................................
0. 3 ................................................................................
0. 4 ................................................................................
0. 5 ................................................................................
0. 6 ................................................................................
0. 7 ................................................................................
1. 0 ................................................................................
1. 1 ................................................................................
1. 2 ................................................................................
1. 3 ................................................................................
1. 4 ................................................................................
1. 5 ................................................................................
1. 6 ................................................................................
1. 7 ................................................................................
Good sectors: 1280/1280 (100%)
Missing sectors: 0/1280 (0%)
Bad sectors: 0/1280 (0%)
writing 80 tracks, 2 heads, 8 sectors, 1024 bytes per sector, 1280 kB total

Directory name            : 211071A05-EN-01D
Object file index pointer : 508 SI: 0x1 (indexed)
User file index pointer   : 510 SI: 0x1 (indexed)
Bit file pointer          : 306 SI: 0x0 (contiguous)
No. of unreserved pages   : 1
Files:
  0   0: I        4 pages      6527 bytes 1989-01-13 10:08:18 (FLOPPY-USER)IN-OEC-EN-A05:INIT
  0   1: I       13 pages     25774 bytes 1989-01-02 16:25:25 (FLOPPY-USER)IN-OEC-EN-A05:XCOM
  0   2: I       79 pages    202854 bytes 1989-01-20 08:43:36 (FLOPPY-USER)IN-OEC-EN-A05:PROG
  0   3: I        1 page        588 bytes 1989-01-06 09:14:14 (FLOPPY-USER)OEC-DUMP-A05:MODE
  0   4: I      103 pages    399360 bytes 1988-11-09 13:07:33 (FLOPPY-USER)OEC-MAIN-A05:PROG
  0   5: I       52 pages    409600 bytes 1988-11-09 13:08:55 (FLOPPY-USER)OEC-EDIT-A05:PROG
  0   6: I       72 pages    174080 bytes 1988-11-09 13:09:45 (FLOPPY-USER)OEC-SERVER-A05:PROG
  0   7: I       20 pages    169302 bytes 1988-11-09 13:11:07 (FLOPPY-USER)OEC-CONFI-EN-A05:CONF
  0   8: I        3 pages    208390 bytes 1988-11-09 13:11:34 (FLOPPY-USER)OEC-SYS-A05:DATA
  0   9: I       12 pages    226892 bytes 1988-11-09 13:11:42 (FLOPPY-USER)OEC-DATA-A05:DATA
  0  10: I        4 pages      7967 bytes 1988-11-09 13:12:00 (FLOPPY-USER)OEC-LIB-2B-A05:BRF
  0  11: I        4 pages      8063 bytes 1988-11-09 13:12:07 (FLOPPY-USER)OEC-LIB-1B-A05:BRF
  0  12: I       75 pages    161792 bytes 1988-11-09 13:12:15 (FLOPPY-USER)OEF-FUNCT-A03:PROG
  0  13: I        6 pages    140746 bytes 1988-11-09 13:14:06 (FLOPPY-USER)OEF-FUNCT-EN-A03:CONF
  0  14: I       16 pages     32514 bytes 1988-11-09 13:14:18 (FLOPPY-USER)DDBTABLES-E08:VTM
  0  15: I      118 pages    241664 bytes 1988-11-09 13:14:45 (FLOPPY-USER)UE-ERMSG-EN-C06:ERR
  0  16: I        1 page        268 bytes 1989-01-16 09:26:23 (FLOPPY-USER)IN-OEC-EN-A05:INST
Directory size: 611 pages
Bit file size : 1 page 

I'm happy with this so there's no hurry to add that to fluxengine for my sake.

:-)

Gandalf-ND commented 4 years ago

My bad... I added a check with "Sector OK" as a debug print just below "Failed to read sector". It was all my code. Only place I used cout instead of printf so I forgot about it. Ooops. :-)

Now I'm going to re-run some images just to clean things up a bit.

davidgiven commented 4 years ago

Wow. That's a lot of disks. Also a lot of FluxEngines. I think this makes you my biggest customer! Please photograph the process?

Please test the latest firmware --- it should be drastically improved. The buffer overflow would have caused spurious corruptions, mostly showing up on HD disks (I was reusing the output buffer while USB was sending it, so if the buffer got written to faster than the USB read from it, then there'd be hilarious results...). You should also get faster reads as it no longer requires syncing with the index pulse before a read happens. It's also using less USB bandwidth, so there's at least a theoretical possibility, at least with DD disks, of running two FluxEngines per machine simultaneously. This will heavily depend on details of the USB topography, of course.

I'll see if I can get a proper fix to you soon; it's clearly necessary. The code should be simple.

Re the 8" drive and the weird track layout: you say:

Reading track 1 gives logical track 0. Reading track 2 gives logical track 2...

Is that a typo, or do you never see track 1? It's possible something's different in the way the track 0 sensor works on 8" drives meaning it's failing to home correctly, but that seems unlikely. Is there a possibility that your disk is using a different track alignment than the drive is? I know that 5.25" disks come in 96 tpi and 100 tpi variants, but I know very little about 8" disks.

davidgiven commented 4 years ago

I think I have a proper fix for this in the sectors branch: https://github.com/davidgiven/fluxengine/tree/sectors Could you give this a try, please? You'll need to build from source (I think you already do) and have the latest firmware.

BTW, there are new fluxengine test bandwidth and fluxengine test voltages commands. If you're running this on a Pi, could you try those and report what they say?

Gandalf-ND commented 4 years ago

That's not a typo. I don't see track 1 when reading 8" disks on that drive. It might be a mechanical issue that the head hits the end of the mechanism as there is a quite metallic noise whenever I try to read track 0 and it results in no data read. It might even be the limit switch gone bad or an error in my soldering of the cable (50 pin edge connector). Since I have more 8" drives available in my computer collection I will test another drive. Hopefully I will have an answer this weekend.

Most of my collection is housed in a house 100 km away but I have a DS/DD drive in my apartment, I just have to finish the cables and power supply. It takes 220V AC, 24V DC, +5V, -5V and I need to make another 50 pin edge connector from an old ISA-bus connector. As I'm dedicating a fluxengine per drive the cabling will be new also, so any errors in my old cable will not be brought over to the next drive.

My collection is more or less listed here : http://www.ndwiki.org/wiki/User:Gandalf

The wall of floppies belongs to the Norwegian Museum of Technology and is the software archive of the mini computer manufacturer Norsk Data AS. Tingo, me and a couple of other guys visited the climate controlled storage last spring. We had 6 hours to rummage through 30 pallets of old machines, documentation and software. It took me all day just to snap pictures of all the manuals and a couple of machines. Tingo concentrated on the software side and with a fluxengine I think he got about 20 images, although to his defense we all were experimenting at that stage. I brought my 8" drive and got it working just the day before but decided not to bring it inside the storage. Anyhow, at that speed it will take forever, especially when considering that there's only personnel there for about six hours once a week. http://www.ndwiki.org/wiki/Telemuseums_storage_in_Fetsund

I'll check the new code and give it a try. I'll get back when I know how it worked.

I haven't tested it on a RPi yet, but I will do that soon.

Gandalf-ND commented 4 years ago

I've done some testing during the weekend... ... the DS/DD drive had some tricky connectors for power so I haven't been able to test it yet. And I have managed to read track 1 on the SS/SD and can do it consistently ...sort of.

If the drive doesn't seek to track 0 I can read track one, but any time I try to seek to track 0 first it skips track 1 and goes straight to 2.

I get a feeling that the fluxengine does a seek to track 0 and reacts directly when the track zero signal is received. According to the manual to the drive (Page 24-25 in http://bitsavers.org/pdf/shugart/SA8xx/39025-1_SA800_801_Service_Manual_Mar82.pdf ) the track 0 signal on the interface is asserted directly as the step down pulse comes when on track 1. The manual states that the track 0 sensor should trigger at track 1 so the track 0 signal from the drive can be sent whenever the step down pulse arrives.

When at track zero, logically there shouldn't be any seek commands while reading track zero, but the read head jumps part of a track and back again for every retry. fluxengine read ibm -s :d=0:s=0:t=0 results in five sharp knocks from the drive when it moves the head. It sounds like someone is punching a hole in the floppy but it's just the read head that is moving rapidly. I made a video of it. No data is read when I try to read track zero. https://youtu.be/Lq_2JOxu1II Looking at it a bit more I see that when I try to read track zero it looks like the drive takes a large step beyond track 0 and the first step up takes it to the real track zero.

I made another video for "fluxengine read ibm -s :d=0:s=0:t=0-5" https://youtu.be/hbs-T63TeLE In it you can see the large difference on the angle of the lead screw. I put a small ink point at track 1, 2 and 3. So you can see how close the tracks are and how far the head has traveled. You can single step a paused youtube video with , and . and it's obvious that it takes a giant leap from beyond zero to zero, then two tracks to 2 and after that single steps.

I haven't ruled out an error in the drive, there might be some strange things going on with the logic on the drive when the track zero sensor triggers.

Anyhow... I haven't been able to reboot the machine with the PSoC development environment yet (I need to finish a project first before closing a bunch of windows) so I'm still on my QnD (tm) solution and haven't tested your latest code yet. And I have done another dirty hack of your code, in reader.cc I check for track 80 and sets it to 1, so a "fluxengine read ibm -s :d=0:s=0:t=1-80" actually reads track 0, 2..77 fails on 78 and 79 and then seeks down to 1 and reads it. Image complete. QnD (tm) :-D So I'm fine. I can live with this solution for the moment and I'll take a look at the PSoC and the logic for track zero when I can reboot my machine. No stress.

I know, I'm an old dirty hacker just doing some dirty hacks. ;-)

davidgiven commented 4 years ago

Your diagnosis is spot on --- that's exactly what it does, deliberately. 5.25" and 3.5" drives require a seek from track 0 to track -1 to reset the disk change line. The drive logic swallows the actual head movement so nothing happens.

This was implemented back when I was actually using the DISK CHANGE line, but I'm not any more (to be more friendly for drives which emit READY on that pin instead). So this can probably be safely eliminated.

Look for home() in FluxEngine.cydsn/main.c and remove this line: https://github.com/davidgiven/fluxengine/blob/master/FluxEngine.cydsn/main.c#L199

davidgiven commented 4 years ago

I did a proper fix for this, and also merged in the required-sectors functionality; these should be in the latest release (including the updated firmware). Look for the --ibm-required-sectors option to fluxengine read ibm. You probably want 0-8 as the argument.

davidgiven commented 4 years ago

Closing for cleanup --- please reopen if you need anything else.