ghaerr / elks

Embeddable Linux Kernel Subset - Linux for 8086
Other
984 stars 106 forks source link

ELKS on Book 8088 #1619

Closed Vutshi closed 11 months ago

Vutshi commented 1 year ago

Description I have a new and shiny laptop based on the 8088 processor. You can read about the Book 8088 on arstechnica. It's a fascinating device that allows for easy CPU changes. I've already tried the V20, Intel 8088, and a Soviet clone of the 8088, and they all work well in DOS. However, I'm currently having trouble getting it to boot ELKS out of the box.

UPDATE: About hardware details see the blogpost in the comment https://github.com/ghaerr/elks/issues/1619#issuecomment-1720048680

UPDATE2: More hardware details from Sergey Kiselev:

The schematic of Book8088 is quite interesting. In addition to the 8088 CPU and the 8087 FPU, it uses an original 8284 clock generator, 8288 bus controller, 8237 DMA controller, 8259 PIC, and 8253 PIT It looks that they've got most of the discrete logic functionality for an XT motherboard into a CPLD, also the CGA controller functionality is implemented as two CPLDs and the original 6845 CRT controller. According to the pinout, the CPLDs seem to be Altera MAX7000S series or Atmel ATF1500AS series in PLCC84 package. They also use some kind of microcontroller to implement the XT keyboard, that is interfaced directly to the XT motherboard logic CPLD The system uses two EPROMs (why not flash ROMs?!) - one 27C512 for the BIOS, and one 27C256 for CGA font The system memory uses one 512KB SRAM and one 128 KB SRAM chip. CGA uses a 32 KB SRAM for the video RAM

Another technical detail, the data bus of all 82xx controllers, the XT logic CPLD, and the memory are connected directly to the CPU data transceiver. This probably will reduce reliability when using external ISA cards. Typically, these things sit behind an additional transceiver...

Configuration

Additional information

$ sudo ./a.out /dev/sdd Opening drive /dev/sdd.. MBR magic bytes in place! Analyzing Partition 0 This is bootable Partition is 63504 sectors (31.01MB) Starting CHS values C=1, H=1, S=0 Ending CHS values C=16, H=63, S=63 Partition starts at sector number 63 (31.00KB in) Partition filesystem id is 128 Analyzing Partition 1 This is not bootable Is an empty partition Analyzing Partition 2 This is not bootable Is an empty partition Analyzing Partition 3 This is not bootable Is an empty partition

Vutshi commented 1 year ago

One more thing. The BIOS of the device can be found here https://github.com/skiselev/8088_bios. The unknown Chinese manufacturer of the Book 8088 just took the Sergey Kiselev's BIOS and modified it a little. He back ported the changes to the master branch recently. As far as I can see the changes mainly concern Turbo mode and keyboard INT 09h.

ghaerr commented 1 year ago

Hello @Vutshi,

Looks like some new fun on your shiny new 8088 laptop! I will read more about it, thanks for the links.

hd32-fat.img starts to boot a little bit: ELKS****1!

The 1 above means disk read error, and I would guess probably an invalid CHS value as you're probably thinking.

You might also try booting the hd32-minix.img version instead of FAT, as the FAT boot loader is more complicated and uses BIOS Parameter Block values from the boot sector which might not be correct.

Welcome to ELKS MBR Boot Manager No bootable partition

It is a contradiction that the ELKS MBR says no bootable partition, while the mbr-analyzer recognizes a boot partition. I don't yet know the reason for this, we will have to investigate further.

I would suggest using MINIX rather than FAT boot images, and don't use an MBR image, just a flat image, and see what else we find. This reduces the complexity of the boot and ELKS will boot non-MBR HD images as though they were large floppies, as long as the CHS values are correct. The ELKS HD image CHS values can be changed by editing CONFIG_IMG_SECT/HEAD/CYL values in .config for the MINIX image HD build.

Thank you!

Vutshi commented 1 year ago

Hi @ghaerr

You might also try booting the hd32-minix.img version instead of FAT, as the FAT boot loader is more complicated and uses BIOS Parameter Block values from the boot sector which might not be correct.

We tried Minix versions as well, they give exactly same results as FAT images.

The ELKS HD image CHS values can be changed by editing CONFIG_IMG_SECT/HEAD/CYL values in .config for the MINIX image HD build.

Then what values should I write there? The same as reported by DOS disk which boots well? Note, that I use physically different CF cards for DOS and ELKS.

Best

ghaerr commented 1 year ago

We tried Minix versions as well, they give exactly same results as FAT images.

You mean that MINIX images showed ELKS *****1!? This means error on the sixth sector read. For MINIX images, the CHS should have the boot block at 0/0/1, and continue reading 0/0/2, etc until max sectors, then the head number is incremented, then the cylinder, etc. The ELKS boot prepares a DDPT and attempts to perform a multi-sector read, which could be failing due to BIOS strangeness. Perhaps #undef FAST_READ in bootblocks/boot_sect.S line 54 and build a new image.

Then what values should I write there? The same as reported by DOS disk which boots well? Note, that I use physically different CF cards for DOS and ELKS.

You mean that the CF card is a different size?

I'm not sure what the CHS values should be to correct this at this point, I am hoping that the multi-sector reads desribed above help. However, the FAT boot loader doesn't use FAST_READ and it's still broken. So it likely won't work.

We have had other issues in the past related to the BIOS not handling disk reads after BIOS DISK RESET calls. I don't recall the exact details.

I am also a bit confused with the MBR analyzer showing start sector 0, as sector 1 is the default numbering scheme. Also, ELKS MBR when you typed 1 showed no partition, which is not correct, as there is a partition there. Normally however the partitions start at a cylinder boundary and these are showing Sector 63 (which is usually the last sector of a 63-sector disk numbered 1-63).

We have others running ELKS on Sergey's BIOS. Do you have more information on the changes made required for your laptop? Will it run with the original unmodified Sergey 8088 BIOS?

Vutshi commented 1 year ago

@ghaerr

You mean that MINIX images showed ELKS *****1!?

Yes. But after looking closely I also see three dots in front: Minix

You mean that the CF card is a different size?

Yes, DOS card is 512Mb, ELKS one is 256Mb

We have others running ELKS on Sergey's BIOS. Do you have more information on the changes made required for your laptop? Will it run with the original unmodified Sergey 8088 BIOS?

Currently I run the stock BIOS. As soon as new flash chip arrives I will use Sergey's Book8088 image of his BIOS. Sergey says that XT-IDE BIOS extension is outdated on the stock BIOS, so I will stop experimenting until switching to Sergey's new BIOS image.

ghaerr commented 1 year ago

But after looking closely I also see three dots in front

Ah, I was in error in my last reply: a . means successful read and * means retry. This means that the first three sectors loaded properly, and the fourth had the normal five retries then disk read error.

For MINIX non-MBR disks, the first sector (physical sector 1) would be read by the boot loader, then the ELKS MINIX boot loader (when #undef FAST_READ is not defined so no multi-sector reads), the second sector of the 1K book block is read (physical sector 2), producing the first .. Then, the MINIX boot loader (running from the just-loaded second sector) reads the 1K MINIX super block into memory which would be physical sectors 3 and 4 on cylinder 0 head 0, producing the next two .s. It appears then that the sectors trying to be read after reading the superblock (which would be the location of the root inode) might not be returning the correct data.

What brand floppy emulator are you using with the CF cards? IIRC we have had some issues with floppy emulators as well.

tyama501 commented 1 year ago

I have read about that 8088 book article before. Very interesting.

Does the book have floppy emulation from the USB port? (Maybe no?) If you can boot from the USB with the floppy image, may be you can install to the harddrive with sys command.

Vutshi commented 1 year ago

@ghaerr

Ah, I was in error in my last reply: a . means successful read and * means retry. This means that the first three sectors loaded properly, and the fourth had the normal five retries then disk read error.

Does it make sense that unlike minix, the FAT image didn't manage to produce the three dots?

What brand floppy emulator are you using with the CF cards? IIRC we have had some issues with floppy emulators as well.

I don't really know. According to the system block diagram the CF card goes into IO chip: System Block Diagram

The IO chip seems to be realized on a programmable logic device ATF1508-XT IO: IMG_7490

The arstechnica article says that "Hard disk functionality is provided by an integrated XTIDE controller". It uses XTIDE BIOS extension.

Vutshi commented 1 year ago

Hi @tyama501

Does the book have floppy emulation from the USB port? (Maybe no?) If you can boot from the USB with the floppy image, may be you can install to the harddrive with sys command.

It seems the USB port can be used for storage devices only after booting into DOS and loading CH375DOS.SYS driver BOOK8088说明书CN new.docx

tyama501 commented 1 year ago

〉This reduces the complexity of the boot and ELKS will boot non-MBR HD images as though they were large floppies,

I am not sure the BIOS can handle the disk without MBR. Is it possible?

I think CHS values of the whole disk depends on CF cards and the IDE emulater so it might be better to use the same CF card with the MS-DOS. Then create a image with H:63, S:61 FAT partition elks image with MBR as the MS-DOS.

tyama501 commented 1 year ago

Oh easier way.

If you can load the MS-DOS partition image and elks floppy imge on qemu, may be you can install to the partition using sys command.

(Although in this case you need to format the MS-DOS first.)

ghaerr commented 1 year ago

I am not sure the BIOS can handle the disk without MBR. Is it possible?

The BIOS disk read functions don't know or care about the MBR itself, as the disk contents are accessed solely though C/H/S parameters. As long as the filesystem starts in the sector following the boot block(s) (1 for FAT, 2 for MINIX), this shouldn't be an issue.

I think CHS values of the whole disk depends on CF cards and the IDE emulater so it might be better to use the same CF card with the MS-DOS.

I am not aware of how CF cards calculate CHS, but if H/S varies with CF card size, this could definitely be an issue. Perhaps try this instead of rebuilding MINIX image with below option:

Then create a image with H:63, S:61 FAT partition elks image with MBR as the MS-DOS.

That's a very good idea - the MINIX ELKS floppies and HD images require the max head and sector value to be stored near the end of the boot sector, and this must match the HD emulator-expected CHS. For FAT ELKS floppies, the max head and sector values are stored in the DOS BPB block at the beginning of the boot sector.

Vutshi commented 1 year ago

@ghaerr Before going into compiling ELKS images with various CHS values we decided to check what DOS thinks about CHS of the drive. It turns out to be a rather peculiar drive. Some utils don’t see it at all (idediag). Some (CheckIt) report an error but provide the CHS data Checkit The parameters are different from what is reported by analyse_mbr.c. This little program reports raw values of the three CHS bytes. According to Wikipedia it has to be decoded and rearranged to get true CHS but even taking all of this into account I cannot get values which agrees with CheckIt data. Strange disk. We are waiting for flash chips to flash Sergei’s BIOS with up to date XTIDE BIOS

ghaerr commented 1 year ago

@Vutshi,

Some utils don’t see it at all (idediag). Some (CheckIt) report an error but provide the CHS data We are waiting for flash chips to flash Sergei’s BIOS with up to date XTIDE BIOS

I had thought that you were already running the updated BIOS with XTIDE support... my latest theory based on your new evidence is that the drive controller isn't fully IDE compliant (thus the errors with DOS diagnostics programs) but that it still works with some boot programs, possibly the result of timing interactions with the existing likely deficient BIOS when booting DOS.

The default ELKS hd32 images are built with heads:16 and sectors:63. The max cylinders doesn't matter at boot time (since without partitions will always be 0), and is only an issue later if it conflicts with the filesystems internal size spec. Even if the HD sector size is 61 not 63, that difference should not produce the results seen in the boot screens, so I'm still thinking that this issue has something to do with the way the drive controller is being accessed by the BIOS. I haven't yet dug into the XTIDE universal BIOS, although your supplied link lists some interesting information in the Known problems section, especially with CF cards and their MBR requirements (!).

ghaerr commented 1 year ago

@Vutshi,

I read the Ars Technica article on the Book 8088, very interesting!

I don't know if you happen to have read the MinusZeroDegrees website about issues relating to XTIDE, I found this particular link interesting: https://minuszerodegrees.net/xtide/XT-IDE%20-%20Problems.htm

That page details some issues with XTIDE and the sometimes requirement for the CF card to have an MBR boot block, or the boot fails. I am guessing the reason might be that the XTIDE controller firmware/software requires the MBR in order to otherwise setup the CHS for the emulated drive (even though the MBR doesn't technically specify the drive CHS values). It is also possible that XTIDE w/CF may require a FAT formatted disk and not work with ELKS MINIX filesystems. We will have to wait and see what is learned with the updated BIOS from Sergey.

Finally, it could be the case that there is a problem with ELKS' MBR tables that render it incompatible with XTIDE, although I suppose that unlikely. I have some seen some BIOS code that handles reading and writing of the boot sector (MBR in this case) specially, and that could be happening here. The proposed solution to this problem is to run FDISK.EXE /MBR on the problem boot CF card using DOS v5.0 or later.

Here is another page of interesting information on XTIDE: https://minuszerodegrees.net/xtide/XT-IDE%20-%20Basics.htm

toncho11 commented 1 year ago

https://github.com/jbruchon/elks/wiki/Installing-HD-image-on-physical-media

There are some comments at the bottom of the page that might be important.

ghaerr commented 1 year ago

There are some comments at the bottom of the page that might be important.

Do you mean the issue with boot sometimes failing when IDE probing is turned on? I fixed that a while ago, if configured ON, it will now time out rather than hang if an IDE isn't present. The current Book 8088 boot problem is occurring during the time the boot loader is reading in the kernel, so kernel configuration isn't an issue (yet).

toncho11 commented 1 year ago

I see.

@Vutshi is using stock images of the 0.6 release, right. Maybe using the current code base 0.7 will yield better results.

ghaerr commented 1 year ago

Maybe using the current code base 0.7 will yield better results.

Agreed, there's been lots of progress since 0.6. I think we should push out v0.7, since there has been not so much activity and lots of changes?

Vutshi commented 1 year ago

@ghaerr

Agreed, there's been lots of progress since 0.6. I think we should push out v0.7, since there has been not so much activity and lots of changes?

That would be cool as I have some problems currently with my Linux computer which I usually use to build ELKS. P.S. Our new fash chip is out for delivery. Soon we will try Sergey’s latest BIOS

toncho11 commented 1 year ago

@Vutshi It seems a minix image of 0.7 works as pointed in https://github.com/jbruchon/elks/discussions/1370#discussioncomment-6573684

toncho11 commented 1 year ago

@Vutshi You should try hd64-minix.img of 0.7 with Rufus.

ghaerr commented 1 year ago

Hello @Vutshi, here's the latest hd64-image.img (soon to be v0.7.0) for you to try, as discussed in https://github.com/jbruchon/elks/discussions/1370#discussioncomment-6573684. It's a non-MBR image, I can't remember exactly why ELKS isn't creating an MBR version. We'll likely have to figure out why the 32MB FAT image won't boot after v0.7.0, but hopefully this image will work for you as it did for @tt1542.

hd64-minix.img.zip

Vutshi commented 1 year ago

Thank you @ghaerr, I’ll try it asap!

ghaerr commented 1 year ago

@Vutshi, as per @tt1542's 360k FAT floppy boot working OK described in https://github.com/jbruchon/elks/issues/1625#issuecomment-1656101842, here's a current version for that as well for your testing:

fd360-fat.img.zip

Vutshi commented 1 year ago

@ghaerr We have made some progress, but we are not fully there yet. First of all, the latest version of Sergey's BIOS is installed and works well as can be seen in the screenshots below. We tried both ELKS 0.7 images written by Rufus and dd with identical results: hd64-minix.img.zip hd64

fd360-fat.img.zip fd360

The small fd360 image is tested using two different CF card (32Mb and 256Mb) with the same results. This was done because we noticed a strange behaviour of the 256Mb card. Namely, we cloned a working CF card containing DOS onto this 256Mb card using dd and the cloned version didn't boot at all. The ELKS at least tries to boot from this card...

Vutshi commented 1 year ago

Quite strange. I attempted to create a DOS boot disk using Rufus (which has that option), but unfortunately, Book8088 doesn't seem to boot from either of the two CF cards I have. DOS

I have already placed an order for new CF cards.

ghaerr commented 1 year ago

First of all, the latest version of Sergey's BIOS is installed and works well as can be seen in the screenshots below.

You mean the new BIOS works well displaying itself on the screen? But does it still boot DOS like the old BIOS used to? It would be interesting to know whether it still boots DOS on the CF card you previously used that worked.

Namely, we cloned a working CF card containing DOS onto this 256Mb card using dd and the cloned version didn't boot at all.

Sounds like there definitely could be some big sensitivity to various CF cards. IIRC @tt1542 said he used a 2Gb CF card which worked.

The ELKS at least tries to boot from this card...

It could be that DOS is trying but the boot loader doesn't show success (or failure) of each sector, and the boot is failing for the same (as yet unknown) reason of CF card incompatibility.

Are there any BIOS settings for the CF card/floppy that might be changed to have any effect?

ghaerr commented 1 year ago

@Vutshi, if you carefully compare the screenshot of @tt1542's working Book 8088 boot in https://github.com/jbruchon/elks/discussions/1370#discussioncomment-6573684 with your boot screenshot of hd64-image.img booting, we can see that while your system displayed "3!" after four dots, the working boot displayed "Linux found" after four dots. Error "3" means "No system found" which means the boot loader couldn't find "/linux" in the root inode. It would seem your system is having problems accurately reading CF sectors for some reason, and the problem is likely some very slow level CF/floppy read issue/incompatibility.

Your other screenshot showing an ELKS FAT boot won't show "Linux found" even when properly booting, as the FAT boot loader operates differently than the MINIX one.

@tt1542, can you take a screenshot of your Book 8088's BIOS startup screen, or any particular (re)configurations you have set? What brand CF card are you using (2Gb version)? I am wondering whether the problem is solely a CF hardware/firmware issue, or whether it may have something to do with XTIDE settings/version?

Vutshi commented 1 year ago

@ghaerr

You mean the new BIOS works well displaying itself on the screen? But does it still boot DOS like the old BIOS used to? It would be interesting to know whether it still boots DOS on the CF card you previously used that worked.

Yes, DOS works well with the new BIOS. In fact, it is even better now, as at least one bug has been fixed. SysChk now shows the disk information, whereas it used to hang on the stock BIOS: syscheck_new_bios

Are there any BIOS settings for the CF card/floppy that might be changed to have any effect?

Not that I aware of.

Do you have any idea why the fd360 image goes further in the booting process than the hd64 image? Perhaps it's worth trying fd360 with Minix?

ghaerr commented 1 year ago

@Vutshi,

DOS works well with the new BIOS.

Have you tried booting ELKS using the same CF card that works with DOS? There's definitely something strange going on with various CF cards and your system.

Do you have any idea why the fd360 image goes further in the booting process than the hd64 image?

The FAT boots will always display a different (i.e. more) dots than the MINIX images, I think that is what is happening. [EDIT: The 2nd half of the FAT boot loader is entirely different than the MINIX boot loader; that combined with the FAT and MINIX filesystem differences, the number of dots displayed differs.]

Perhaps it's worth trying fd360 with Minix?

Here's fd360-minix.img:

fd360-minix.img.zip

tt1542 commented 1 year ago

@Vutshi , just to let you know that I did NOT test or use the 360K floppy image with my Book 8088.

I solely used the Minix 64M image that @ghaerr linked to above.

I used a SanDisk ultra II 2.0GB CF disk for testing. No reconfigurations in the BIOS were made (and I do not know of any that could be made there at all...). This system has the later version of the "stock" BIOS w/o the bogus copyright notices.

Vutshi commented 1 year ago

@tt1542 thank you for the information about the CF card. Nowadays, it is so hard to buy a small card. 1Tb is simple, 2Gb is impossible :) @ghaerr, does ELKS care about the size of the disk? Would it work from a 16GB card?

ghaerr commented 1 year ago

does ELKS care about the size of the disk? Would it work from a 16GB card?

No, the boot doesn't care about the max cylinders, but does care about Heads and Sectors (to know when to increment the cylinder) - it gets the CHS values from the MBR (for FAT disks) or the ELKS BPB (Bios Parameter block for MINIX disks in the first sector).

The XTIDE has to emulate CHS to calculate a logical block address. It may know about FAT boot sectors and MBRs, but surely does not know about ELKS MINIX disks. However, the FAT boot isn't working either. So we likely have to match H and S from the MINIX disk image to the CF card via the XTIDE emulation. I don't really know how XTIDE works. If XTIDE uses 16 heads and 63 sectors, it would match the default 64Mb HD MINIX image.

I suppose it is possible that the CHS for the CF card you're using doesn't match the ELKS HD 64Mb MINIX image, which is by default CHS 127,16,63. To change the values, you can run setboot -B<sectors>,<heads>,<cylinders> hd64-minix.img to set H and/or S to something other than 16 and 63. (fdisk can be used to change MBR CHS values).

toncho11 commented 1 year ago

You can also use a sd card inside a compact flash. Looks like ebay is the best source for small cf cards.

toncho11 commented 1 year ago

Yes, on one of my computers I do install ELKS on a 32 GB sd card using XT-IDE with IDE to SD card adapter. First I boot from floppy into ELKS, then I do partitioning and then I use the sys command in ELKS to install ELKS on the HDD (the sd card).

This is the install procedure: https://github.com/jbruchon/elks/wiki#installation

toncho11 commented 1 year ago

The Book88 can use external ISA card, right? So @Vutshi you should try putting an ISA floppy controller and boot ELKS from a floppy (360kb or 1.44). This will give all sorts of diagnostic info and it is supposed to work perfectly. I am using "ISA 8bit High Density Floppy", the one designed by Sergey Kiselev and I bought it on ebay.

image

This will work for 1.44 floppy drive directly. If you want 3.60kb you need an IDE cable that has both the IDE and the old connector for the 360 kb floppies (where part of the cable is crossed).

Actually any 8 bit ISA floppy controller should work.

Vutshi commented 1 year ago

Hi @ghaerr

Exciting news! Our fresh batch of industrial-grade SLC CF cards has just been delivered. Hopefully, it'll suffice for the next 50+ years of computer archaeology. :) CF cards

The initial outcome is that Book8088 successfully boots ELKS 0.7 without any issues: Minix boots

Yet, what captivates is not how something works, but how it breaks. Two potentially linked problems have surfaced with Book8088.

The only way to fix it is to rewrite the ELKS image. Importantly, this problem does not affect the FAT version of ELKS.

Best.

ghaerr commented 1 year ago

Hello @Vutshi,

So the whole original boot problem was crappy CF cards? Geez!!! Well, I'm glad that's been figured out :)

date command does feel the time flow. However, the clock function seems to suggest that Book8088 existed near a black hole's event horizon

date runs off the hardware timer tick. clock is used in /etc/rc.sys to read the CMOS real time clock. clock handles outside the kernel, then sets the time with a kernel syscall. So I'd guess that the Book 8088 RTC isn't IBM compatible? You can take a quick look at elkscmd/sys_utils/clock.c to see what you think. It's a bit of a mess, as its a very old program and all the ELKS architectures RTC code has been added in there. It needs a cleanup.

The reboot and shutdown commands kill the Minix version of ELKS.

That is strange, although the it's possible that the some of the Auto Power Management or reboot code in ELKS isn't compatible with Book. reboot essentially does:

...
            case 0x0123:                /* reboot*/
                hard_reset_now();
                printk("Reboot failed\n");
                /* fall through*/
            case 0x6789:                /* shutdown*/
                sys_kill(1, SIGKILL);
                sys_kill(-1, SIGKILL);
                printk("System halted\n");
                do_exit(0);
                /* no return*/
            case 0xDEAD:                /* poweroff*/
                apm_shutdown_now();
                printk("APM shutdown failed\n");
        }
...
/*
 * The following routines may need porting on non-IBM PC architectures
 */

void hard_reset_now(void)
{
#ifdef CONFIG_ARCH_IBMPC
    asm("mov $0x40,%ax\n\t"
    "mov %ax,%ds\n\t"
    "movw $0x1234,0x72\n\t"
    "ljmp $0xFFFF,$0\n\t"
    );
#endif
}

APM shutdown looks like:

/*
 *  Use Advanced Power Management to power off system
 *  For details on how this code works, see 
 *  http://wiki.osdev.org/APM
 */
void apm_shutdown_now(void)
{
#if defined(CONFIG_APM) && defined(CONFIG_ARCH_IBMPC)
    asm("movw $0x5301,%ax\n\t"
    "xorw %bx,%bx\n\t"
    "int $0x15\n\t"
    "jc apm_error\n\t"
    "movw $0x5308,%ax\n\t"
    "movw $1,%bx\n\t"
    "movw $1,%cx\n\t"
    "int $0x15\n\t"
    "jc apm_error\n\t"
    "movw $0x5307,%ax\n\t"
    "movw $1,%bx\n\t"
    "movw $3,%cx\n\t"
    "int $0x15\n\t"
    "apm_error:\n\t"
    );  
#endif
}

With regards to why the MINIX image is trashed while FAT is not, not sure yet. Possibly some buffers need syncing and the APM/reboot code stops all that.

In order to debug, I would suggest you put some printf or printk code in elkscmds/sys_utils/reboot.c and poweroff.c as well as the routines above so we can see a bit more about what's happening.

Another thought would be to diff the before and after image of fd1440.img, commenting out running /etc/rc.sys first in /etc/inittab. We could then see exactly which blocks are trashed.

ghaerr commented 1 year ago

It basically fails to boot and becomes stuck after displaying the time.

If you uncomment the set -x in /etc/rc.sys we will be able to see where it is freezing during the rc.sys execution.

Vutshi commented 1 year ago

@ghaerr

So the whole original boot problem was crappy CF cards?

Yes, entropy was merciless to a pair of aging CF cards.

With regards to why the MINIX image is trashed while FAT is not, not sure yet. Possibly some buffers need syncing and the APM/reboot code stops all that.

After playing around we concluded that the problem is likely a combination of Minix, sync, and XTIDE.

A simple restart via ctrl+alt+del is harmless. The reboot command seems to differ only by sync() before restarting. A manual sync followed by ctrl+alt+del reproduces the booting problem. We noted that the problem occurs only with some probability if reboot or sync is the first thing done after successful booting.

On the contrary, if we edit the rc.sys file and perform syncing then damage to the Minix filesystem is bigger and booting looks like this: reboot after editing file

We have also reproduced this problem on another computer (Schneider EuroPC) which has a bit older XTIDE BIOS version and it seems to be even more susceptible to the file system damage by syncing. On the other hand, this computer has a proper FDD so we booted ELKS from a floppy and observed no problems whatsoever.

ghaerr commented 1 year ago

Hello @Vutshi,

Thanks for the precisely detailed observations. After some fiddling around, I notice that ELKS will sync two buffers, 1K bytes each, directly after boot regardless of whether /etc/rc.sys is run (I have it commented out in /etc/inittab for testing). These two buffers are (on 1440k image, varies on others): block 1, the super block, and block 12, the block for the /dev/tty1 inode. The super block is always written (I'm looking to change that as it's also written twice in boot I'm seeing - marking and unmarking the MINIX filesystem dirty bit) - and /dev/tty1 is opened by /bin/login for read/write so its access time is updated. The write for block 1 is scheduled during the boot process (twice). The write for block 12 only happens when sync or reboot is executed, with reboot also writing block 1 for the third time.

Here's my latest guess as to what's happening: since the CF card is actually flash, and messes around internally with its flash-to-sector number ordering (in order to perform wear-leveling on the flash RAM), if a bus or hardware RESET occurs shortly after any CF write-in-progress, the flash-to-sector mapping gets screwed in the CF RAM, and the next boot fails.

Here's the source for reboot:

int main(int argc, char **argv)
{
    sync();
    if (umount("/") < 0) {
        /* -f forces reboot even if mount fails */
        if (argc < 2 || argv[1][0] != '-' || argv[1][1] != 'f')  {
            perror("reboot umount");
            return 1;
        }
    }
    sleep(3);
    if (reboot(0x1D1E,0xC0DE,0x0123)) {
        perror("reboot");
        return 1;
    }
    return 0;
}

The first sync() will cause block 15 (/dev/tty1) to be written. This is also the case when running sync.

In the reboot case, the call to umount("/") will unmount root and mount again as read-only, which causes block 1 (super block) to be synced. It has an internal sync so another sync() isn't required.

I'm not sure why sleep(3) isn't long enough, perhaps your CF needs more. I would suggest experimenting running sync right after boot up, then CTRL-ALT-DEL 20 seconds later, versus right away. That should help us determine if my guess is correct as to what may be happening! I suppose it is possible that there needs to be another call to BIOS to tell it to wait for I/O complete on flash, but I'm not sure what that might be. Perhaps its time to look at XT IDE source.

Vutshi commented 1 year ago

@ghaerr

I would suggest experimenting running sync right after boot up, then CTRL-ALT-DEL 20 seconds later, versus right away.

We'll carry out the experiments later today.

Are there any obvious disparities in the synchronization of FAT and Minix file systems that could account for the distinct behavior?

Vutshi commented 1 year ago

Hi @tt1542

I wonder whether you can observe the same problem https://github.com/ghaerr/elks/issues/1619#issuecomment-1691733230 with sync and reboot on Minix ELKS in your Book8088?

ghaerr commented 1 year ago

Are there any obvious disparities in the synchronization of FAT and Minix file systems that could account for the distinct behavior?

Yes - FAT doesn't have a superblock and fakes /dev entries so neither block 1 nor block 12 (/dev/tty1) get synced on a sync command like MINIX does. FAT uses a 'fat' (file allocation table) table which is updated at the time of file close. I will confirm the exact differences shortly, but this also helps the current theory that the problem is related to the timing of the last CF card write and resetting the machine/BIOS.

ghaerr commented 1 year ago

@Vutshi: I have confirmed there is disk write activity on FAT filesystems on sync or reboot after a FAT boot.

ghaerr commented 1 year ago

[EDIT: @Vutshi: Oops - I have confirmed there is NO disk write activity on FAT filesystems on sync or reboot after a FAT boot.]

Vutshi commented 1 year ago

@ghaerr

[EDIT: @Vutshi: Oops - I have confirmed there is NO disk write activity on FAT filesystems on sync or reboot after a FAT boot.]

This is an interesting datapoint. However, it cannot explain why FAT functions with XTIDE, as I had to synchronize after modifying rc.sys to preserve the edits. Subsequent reboot shows that FAT ELKS operates smoothly: ELKS FAT 0 7 works

Regarding the experiments with Minix ELKS we did sync and CTRL+ALT+DEL with various pause durations up to 5 min and it didn't help. More precisely we performed consecutively editing of rc.sys, sync, pause, CTRL+ALT+DEL. The result is damaged filesystem. It is harder to reproduce with immediate syncing after boot but on a 9th try it also results in damaging the filesystem. We also evaluated ELKS v0.6, which seems to function well on Book8088 and experiences the same Minix syncing issue.


During the 0.6 tests, an additional issue peculiar to v0.7 on Book8088 came to light. Specifically, the cursor is not visible within the kilo editor, and it fails to appear in the terminal after exiting kilo. This issue is absent in v0.6 and doesn't occur on another computer running v0.7.

ghaerr commented 1 year ago

it cannot explain why FAT functions with XTIDE

More precisely we performed consecutively editing of rc.sys, sync, pause, CTRL+ALT+DEL. The result is damaged filesystem.

Perhaps its time to bring out the heavy artillery - perform a hexdump of the .img file, before and after, then run it through diff. That will show us the damaged block as well as what's in it.

an additional issue peculiar to v0.7 on Book8088 came to light. Specifically, the cursor is not visible within the kilo editor, and it fails to appear in the terminal after exiting kilo.

I'll take a look at this. The ANSI cursor on/off sequence was added to v0.7 and that probably has something to do with it.

Vutshi commented 1 year ago

@ghaerr

Perhaps it’s time to bring out the heavy artillery - perform a hexdump of the .img file, before and after, then run it through diff. That will show us the damaged block as well as what's in it.

This is what I wanted to do as well. We didn’t do it yet because the way the filesystem is damaged seems to be a bit random. Tomorrow we will look into this.

Another thing I want to try is to take a CF card from a different brand.