ghaerr / elks

Embeddable Linux Kernel Subset - Linux for 8086
Other
983 stars 106 forks source link

Boot regression on FD360 #253

Closed mfld-fr closed 4 years ago

mfld-fr commented 5 years ago

Reported on the mailing list: boot is not working any more on an IBM PC/XT with 360K floppy disk: https://www.spinics.net/lists/linux-8086/msg00878.html

mfld-fr commented 5 years ago

In the new boot sector code, the disk geometry is fixed and set through configuration. Currently only the parameters of a 3,5'' 1.44M disk are defined. Add the parameters for the 5.25'' 360K disk (CHS = 40 tracks / 2 heads / 9 sectors).

ghaerr commented 5 years ago

Is this the reason that ELKS will boot with some emulators, like VirtualBox and 8086tiny, but not others, like DOSBox and fake86?

mfld-fr commented 5 years ago

To answer that question, one needs to know the floppy drive geometry as presented by the emulator. I only know that QEMU by default presents a 3,5'' 1.44 MB geometry for the floppy drive. There is some commented code in the new boot sector to display the CHS parameters as reported by the BIOS, that could help to select them in the configuration.

ghaerr commented 5 years ago

@mfld-fr: I'm still trying to figure out the reason a few emulators won't boot ELKS. I've been looking around at your rewrite of minix.c -> minix_first.S. Nice job getting rid of the need for the extra ELKS /boot/boot :) However, I still don't understand where one might setup the # sectors for a floppy boot, can you tell me where that's specified in the configuration?

I've attached a screenshot of DOSBox failed boot on a Linux build of the latest source producing images/fd1440.bin. Unfortunately, most of the good information is scrolled off the screen (any way to get that info?). The screenshot does show that minix.bin thinks that the 1.4M 3.5" floppy has 36 sectors when it should only have 18. I'm thinking this is the reason DOSBox and other emulators fail.

If you want to try this yourself, run DOSBox and type "boot /pathto/fd1440.bin"

Can you help me?

Screen Shot 2019-04-18 at 5 04 55 PM
ghaerr commented 5 years ago

@mfld-fr: Never mind, I have figured out the boot problem. It seems that the minix_first/second boot code correctly uses the BIOS to get the floppy CHS and /Image is loaded properly. However, the kernel floppy block device driver, doshd.c attempts to re-probe the floppy (why?) and this fails, on DOSBox and fake86 emulators, by coming up with 36 rather than 18 sectors for the fd1440.bin image.

Using menuconfig to set "hard" floppy settings, nor using menuconfig to set using the disk parms from the BIOS fix this. I ended up having to kluge "drivep->sectors = 18;" just after the probe in order to get the boot to work.

I had been studying the previous minix.S boot sector code, prior to your rewriting it for gcc-ia16 and removing the helper minix_elks.bin loader. Very nicely done. It looks like you always use the BIOS to get the disk params in the rewritten minix_first.S loader, which solves the boot problem I had booting ELKS 0.2.0 images. I haven't pinned down exactly why some emulators fail, but at least they report the DPB correctly, so they work with the rewritten boot block.

However, since your new boot loader always uses the BIOS to get disk params, then I suggest that either 1) that those values get passed to the kernel doshd block driver directly, or 2) that the kernel block driver NOT re-probe but instead get the disk parameters again and just use them. When I set that option in .config "CONFIG_HW_USE_INT13_FOR_FLOPPY", this does not work.

This can be debugged using DOSBox and the "boot /path/fd1440.bin" at the z:> prompt.

Mellvik commented 5 years ago

Good work, Gregory, and a practical hint: When things roll off the screen too fast, use your phone to film it. Has saved me any number of times...

H

mfld-fr commented 5 years ago

The current version of the new "boot sector" & "boot loader for MINIX" in bootblocks is actually a draft from my point of view. I rewrote that part because 1) we were using the DEV86 bootblocks and 2) it was an opportunity to simplify the boot process. See #225 for the context.

For drive & disk geometry, one should consider 3 cases to complete this current draft:

I intend to do the same in the ELKS kernel after finalizing the "boot sector & loader" design, but if anybody wants to do it now, welcome !

ghaerr commented 5 years ago

I could finalize the booting/probing work from @mfld-fr's excellent rewrite, but want to know whether we need to keep the many ancient .config options, or whether we should move to SIMPLE. There's so much #ifdef code in ELKS its very, very hard to follow. Can it be deleted, much like the dev86 tree was thrown away, and move into the future?

I propose following almost @mfld-fr's suggestion: only support two options, a "fixed" option where the disk geometries are specified at compile-time, and supporting the "get drive geometry", where the BIOS is queried for floppy/hard disk characteristics (the default). The other #if options would be deleted, and the doshd.c driver code cleaned up.

The probing code doesn't actually work on some emulators (I'm guessing they return OK when they shouldn't for larger sector values), and the probing is slow on real PC when most support BIOS get drive geometry. To be quite frank, I'm not sure anyone actually uses ELKS on an ancient PC at all, and if they need very strange stuff, they can go back to earlier ELKS versions. The drivers/block/floppy.c source is marked as permanently broken, and should just be deleted.

There needs to be discussion on what other .config variables can be removed.

Mellvik commented 5 years ago

Gregory, First, thanks for the energy you're putting into the project.

Secondly, a reminder: there are all kinds of diverging interests here - possibly very different from yours.

So - don't disregard old klunkers (PCs before 80386 per my definition). My primary interest in ELKS is related to (physical) PCs anno < 1990, mostly Compaq portables (III and 386). My secondary interest is to get ELKS going on the LANTRONIX Xport, which is a (now almost vintage) IoT device which looks like an Ethernet connector, has a 80186 based SoC inside, 512K Memory, etc. So - ELKS is for physical computers (too), including OLD PCs with weird BIOSes etc. such as COMPAQs. In fact, we had a ball last fall getting Ethernet to work on the Compaq Portable III with the extension chassis and a NE2000 ISA board. The physical environment is SO different from a virtual one, running on your 3+GHz desktop or laptop - in particular when it comes to speed. We ended up fixing serious file system performance issues discovered during the Ethernet/TCP testing. BTW - My next project is the Xircom Parallel PE3... Some of us find it fascinating that a unidirectional (!) parallel port can deliver reasonable duplex Ethernet performance on mid-80s machines. To me, getting that to work with ELKS is an intriguing challenge.

So - let's keep physical computers - including old klunkers - in mind while we evolve ELKS into something really useable...

//Mellvik

ghaerr commented 5 years ago

@Mellvik: Thanks for your comments.

Agreed on older PCs, and thanks for the reminder that thing are indeed different with real hardware than fast, fancy modern emulators :)

However - my comments were directed towards the fact that @mfld-fr rewrote the boot sector code (far superior IMO not having helper boot loaders and a ridiculously complicated makefile for creating them), and that rewrite only supports BIOS with “get drive geometry”. Once /linux is loaded, the doshd.c/floppy drive probe doesn’t work on many emulators, and the source file remains hopelessly complicated with options that would never run with the rewritten boot block either.

I am left still wondering - given your goals and real use of very old equipment - will the idea of only supporting only two floppy boot options work, that of either using BIOS get drive characteristics OR hardcoding the disk geometry, or not? We’re already throwing out old boot code, can we discard unused/unworking sections of the kernel floppy driver or not? And if we can’t throw it out, what section would the new code be placed in, yet another confusing .config #define?

My problem is that, I want to rewrite the linux floppy driver, but I can barely understand it with the many (unused?) options abounding, and couldn’t test them if I wanted anyways.

My question remains: how should I rewrite the /linux floppy support such that it becomes simple, yet remains useful for the ELKS community, without trying to support options that no longer work anyways?

Mellvik commented 5 years ago

Gregory, IMHO manually configured floppy size is perfectly fine. In fact I have a hard time seeing that anything else makes sense. If I build a 1440 image, the flp size/geometry is given. If I build a 1200 image, etc etc.

In a running system, the easy (and classic) way is to have minor device numbers indicate the actual floppy format. Not elegant, but simple and minimalistic in terms of coding.

/M

ghaerr commented 5 years ago

In a running system, the easy (and classic) way is to have minor device numbers indicate the actual floppy format. Not elegant, but simple and minimalistic in terms of coding.

@Mellvik: that’s a really good idea. Looks like ELKS uses this for hard disks, but not for floppies. Simplifying the doshd.c support to use BIOS for /dev/fd0 and fd1 devices, and support device numbers for for a set of new /dev/fda1440 etc, might allow the floppy probing code to be removed and still allow all floppy disk formats to be used after boot.

mfld-fr commented 4 years ago

Paul reported on the mailing list that the RAW boot sector loads successfully the 'setup' + 'system' part, while the MINIX boot sector does not. So it looks like something is missing in the second that the first contains, and that thing would be the BIOS FDD parameter table change with the sectors / track.

mfld-fr commented 4 years ago

Reworked the MINIX boot block to make room, then to add the patch of the BIOS floppy parameter table, as in the RAW boot block. This should solve the problem, but I have no HW to test on real 360K floppy. Closing the issue for now, would reopen it on any failure report.

mfld-fr commented 4 years ago

Problem occured again in #288, so reopening that issue :disappointed:

mfld-fr commented 4 years ago

The new code to copy and change the table was not restoring the data segment after copying. Fixed in commit ea167e1, and tested on real HW (720K floppy) in #288.