Mellvik / TLVC

Tiny Linux for Vintage Computers
Other
7 stars 0 forks source link

fsck not working in 360k images #41

Open Mellvik opened 5 months ago

Mellvik commented 5 months ago

fsck reports perfectly valid minix file systems as not a minix file system, perhaps DOS? on 360k floppies.

ghaerr commented 5 months ago

I haven't tested this on ELKS yet, but IIRC the TLVC direct FD driver doesn't run the older-fashioned probe routine like the non-async driver does, and also operates differently with regards to auto-determining the floppy type. These may be contributors to the problem.

To test, perhaps try running hd /dev/fd0 or hd /dev/rfd0 and comparing that output to known good output for the first few sectors of the disk. In the non-async driver, the first sector of the disk is read and the EPB used to determine floppy type, while of course the second block contains the MINIX filesystem information, which apparently fsck is rejecting.

Mellvik commented 5 months ago

Hi @ghaerr - thanks for responding. I noticed this problem in a passing and decided it was too important to just put on the list. Since the type detection works fine and the fs mounts fine, I suspect it's an fsck problem.

Will look at it soon, having a real XT is fun - seem to have encountered a buffer issue. Another one, non fatal, speed related.

Mellvik commented 2 months ago

A followup on this one. A different problem but I suspect they are related: Some 360k FAT floppies 'automount' as minix fs, and mount incorrectly when told to be FAT:

Examples (notice that the auto detection of floppy size is always correct).

This is one that works (mounts correctly automatically):

df1: Auto-detected floppy type 360k/AT
FAT: me=fd,csz=2,#f=2,floc=1,fsz=2,rloc=5,#d=112,dloc=12,#s=720,ts=0
FAT: total 360k, fat12 format

This is one that does not (it mounts as minix, shows empty). When mounted with the -t msdos option, this is what we get (notice the fs size reported):

df1: Auto-detected floppy type 360k/AT
FAT: me=f0,csz=1,#f=2,floc=1,fsz=9,rloc=19,#d=224,dloc=33,#s=2880,ts=0
FAT: total 1440k, fat12 format

An ls listing is just garbage.

For the record, when mounting a 1.2M FAT image, it looks like this and works fine:

df1: Auto-detected floppy type 1.2M
FAT: me=f9,csz=1,#f=2,floc=1,fsz=7,rloc=15,#d=224,dloc=29,#s=2400,ts=2396556992
FAT: total 1200k, fat12 format

@ghaerr, knowing the inside of the FAT implementation, do the FAT numbers reported at mount time mean anything to you?

ghaerr commented 2 months ago

knowing the inside of the FAT implementation, do the FAT numbers reported at mount time mean anything to you?

Yep. The #s= is total sector count. The middle (non-working) disk is clearly showing to be a 1440k disk from its FAT mount stats with #s=2880, and reported as such. The disk image FAT numbers are pretty consistent with that of a 1440k disk, not 360k, as the root directory and FAT table are also larger. It would seem to me that the DF driver is incorrectly auto-detecting it as 360k/AT, when its actually 1440k.

I would suggest the following debugging (is this on real hardware, or can it be duplicated on QEMU?): 1) test on BIOS FD driver, and 2) post dd disk images so we can try duplicating under QEMU.

My DF driver has some debug information that can be turned on for auto-detect. It would be useful to see some of that detail.

Another issue is that the DF driver doesn't currently do any of the "ELKS" probing that the BIOS FD driver does, including trying to read the ELKS BPB or FAT BPB. Thus there is no secondary CHS checking occurring with the FAT BPB, for instance, nor ELKS signature CHS checking.

I can explain each of the FAT numbers if you like.

Mellvik commented 2 months ago

Thanks @ghaerr - much appreciated.

Having taken another deep dive into what's actually in the boot sector (the drive is a physical 360k drive on a physical XT) I've concluded that the boot sector is somehow screwed up. I should have seen that before 'flagging' it.

The question remaining is why mount accepted it as a valid minix fs, but then again - mount cannot be protected against (pseudo) random screwups.

ghaerr commented 2 months ago

I've concluded that the boot sector is somehow screwed up.

The DF driver (unlike the BIOS FD driver) doesn't use the boot sector at all for disk CHS, correct?

why mount accepted it as a valid minix fs

I think the problem is the "auto-detect" feature of mount when running without the -t argument: there is no way to validate for sure without actually mounting the driver and having the larger FS code validate the filesystem. So what mount does without -t is punt: IIRC it checks the superblock for MINIX, and if not found, guesses that it is FAT.

but then again - mount cannot be protected against (pseudo) random screwups.

Well, it can to a degree, but more verification code will be needed. Are you sure it is mounting a MINIX disk? fsck doesn't actually do a mount, it just starts filesystem checking. Does fsck need better verification code?

When mounting, the fs/minix/inode.c::minix_read_super routine is called from fs/super.c::read_super from do_mount with a "silent flag set if mount is attempting auto-mount (i.e. no -t). You might add some debug code in the minix_read_super routine to see why it is accepting the mount. It looks like the following checks are made:

        if (ms->s_magic != MINIX_SUPER_MAGIC) {
            if (!silent)
                printk("VFS: device %D is not minixfs\n", dev);
            msgerr = err0;
            goto err_read_super_1;
        }
        if (ms->s_imap_blocks > MINIX_I_MAP_SLOTS) {
            msgerr = err4;
            goto err_read_super_1;
        }

The MINIX magic number is checked as well as the number of IMAP blocks. Frankly, I'm also a bit surprised as to how after checking these two numbers the image was accepted as MINIX? It might pay to ensure this is in fact what is happening with some debug statements.

Another solution might be to add more kernel code to try to validate FAT when MINIX is rejected, or always use the -t option.

ghaerr commented 2 months ago

fsck reports perfectly valid minix file systems as not a minix file system, perhaps DOS? on 360k floppies.

After reading this whole issue all over again, I may have misunderstood what the problem is. If the issue is just that fsck reports perfectly valid minix file systems as not a minix file system, I would say the issue doesn't involve mount at all, since fsck doesn't mount a filesystem. If so, ignore my last comment above. The issue is likely that the DF driver is auto-detecting the wrong disk type and sending incorrect data to fsck, which then rejects the disk as being MINIX.

Given the issues we've seen with the DF driver, as well as it not using the proven probe code checking ELKS and FAT BPB for CHS etc, I would suggest always testing against the known-working BIOS FD driver to rule out DF driver issues before going to the higher level code.

Mellvik commented 2 months ago

Thanks @ghaerr, and apologies for 'diverting' from the 'headline' issue, fsck fs detection.

The DF driver is in the clear on this issues - not because it's bug free, it probably isn't, but it is correctly detecting the format in all cases tested. In the case at hand, there is no detection. This is an XT, the 360k format is the only one and hard wired.

I'm having hardware problems on that system right now, otherwise this would be a good time to dive right into the reported fsck problem. I'll be back on that later.

Mellvik commented 2 months ago

Given the issues we've seen with the DF driver, as well as it not using the proven probe code checking ELKS and FAT BPB for CHS etc, I would suggest always testing against the known-working BIOS FD driver to rule out DF driver issues before going to the higher level code.

what issues are you referring to? I don't know off hand about any issues with the DF driver, so an update would be good.

ghaerr commented 2 months ago

what issues are you referring to? I don't know off hand about any issues with the DF driver

Sorry for the confusion - I mean to write "Given the issues we've seen with fsck and mount using the DF driver...".

I am not aware of any DF driver issues. I was thinking that perhaps, given the lack of attempting to read any BPB and EBP within the DF driver which does occur in the BIOS FD driver, this may bring more information to light on what is happening. For instance, the BIOS FD driver doesn't even attempt a hardware probe if sector 0 contains a BPB or EBP; in that case the CHS is taken directly from the BPB/EPB. This could in theory allow a 1440k disk image that was written onto a 360k floppy to be interpreted as 1440k, especially since the BIOS may also be doing additional CHS processing behind the scenes from the ELKS driver.

This is an XT, the 360k format is the only one and hard wired.

The middle FAT mount listing in your post above is very consistent with the BPB of a 1440k floppy. How that might have ended up on a 360k floppy is puzzling, unless the boot sector were replaced, and in that case the DF and FD drivers would likely treat the media differently.

Mellvik commented 2 months ago

The middle FAT mount listing in your post above is very consistent with the BPB of a 1440k floppy. How that might have ended up on a 360k floppy is puzzling, unless the boot sector were replaced, and in that case the DF and FD drivers would likely treat the media differently.

Like I said, the explanation for this was found. Indeed puzzling, but the boot sector was for some reason damaged (or replaced) and actually contained the data as shown by mount. So mount was confused by bad data, and the subsequent (philosophical maybe) question was whether mount could have detected the garbage instead of mounting the volume. It's admittedly an odd case, but mount could easily refuse to mount a volume that pretends to be larger than the physical capability of the drive. Not an issue afaik, just a thought.

As to the original issue, fsck - it's still pending.

BTW 1: Experience shows the direct fd driver to do a better job at reading 'weak' (barely readable old) floppies than the (XT) BIOS, probably thanks to the 'shaker' regime in the driver.

BTW 2: Even the oldest of PCs have 765 floppy controllers, which make them physically capable of using higher density drives, at least 720k 3.5in. The BIOS prevents such use, while the direct driver enables it. In fact, if the FD controller is more recent, and XT may be able to use both 1.2 and 1.44 MB drive capacities, DMA speed being the real limitation. An area for experimentation now that the pieces are in place. Such 'non-native' densities would not be bootable though.