ghaerr / elks

Embeddable Linux Kernel Subset - Linux for 8086
Other
1.01k stars 108 forks source link

panic: divide fault #2070

Closed toncho11 closed 2 weeks ago

toncho11 commented 2 weeks ago

Not sure it is a problem or it should be like that but if you enable/uncomment in /bootopts both:

ghaerr commented 2 weeks ago

You mean like this:

task=6 buf=8 cache=4 file=20 inode=24 heap=15000 n # min w/no rc.sys
#sync=30
#init=/bin/init 3 n # muser serial no rc.sys
init=/bin/sash

I can't get this to repeat on QEMU. It tries to run /bin/sash with n as an argument and says "n: no such file or directory" after mounting VFS root, which although not booting is correct with /bin/sash. Are you running on real hardware? Which format disk/.config?

toncho11 commented 2 weeks ago

Yes, that's it on real 8Mhz CPU. On fat.

ghaerr commented 2 weeks ago

Which CPU and system exactly? 8088 IBM 5150?

Just tried the same test on QEMU w/FAT and same result for me, just "n: no such file or directory" correct behavior.

You might want to try just the line init=/bin/sash n with the other commented out, and various variations to see when the error goes away, since I can't repeat it over here. Thank you.

ghaerr commented 2 weeks ago

Here is a screenshot of QEMU with the two lines uncommented. I will need more exact details of system and screenshot to debug further until I can replicate.

Screen Shot 2024-10-07 at 11 15 12 PM

It may also be worthwhile testing on another emulator.

toncho11 commented 2 weeks ago

I replicated the same setup in 86box emulator: Amstrad 1640, 8Mhz , VGA card and I get the error. amstrad 86box.zip This is the config file for 86box. I used a fat hdd image for ELKS.

It is time for sleep here in Europe, I think you can understand this now :)

toncho11 commented 2 weeks ago

And this is the fat image: hd32-fat_kernel_panic.zip with panic already, but you can mounted.

toncho11 commented 2 weeks ago

86box has MacOS version: https://github.com/86Box/86Box/releases/tag/v4.2.1 Hope you can run it.

I mean tomorrow :)

ghaerr commented 2 weeks ago

I downloaded the macOS 86box as well as your 86box.cfg, but it is saying that I need to download ROM files in order to run the emulator. I downloaded the ROM set which is 25+ files. Do you have any instructions as to how to select one of the many ROMs in order to get it to run?

ghaerr commented 2 weeks ago

@toncho11, actually it just started running after renaming the ROM set to "roms". However, it seems to only want to boot a floppy image. I will play around with figuring out how to configure it to boot your image.

ghaerr commented 2 weeks ago

Ok, it somehow created a completely blank hd32-fat.img. I replace that with yours and the emulator booted up ELKS. However, 86box is showing exactly what QEMU is: "n: no such file or directory"!!!

Screen Shot 2024-10-08 at 9 11 20 AM
ghaerr commented 2 weeks ago

I am guessing perhaps the ROM used for emulation may be why the results are different? I am not yet sure how to specify the working directory of the emulator and am running it from my desktop, along with the 86box.cfg and hd32-fat.img, all on the desktop. The 86box.cfg specifies bios = xt. I can't see ways to verify this information when the emulator is running.

toncho11 commented 2 weeks ago

Go to settings and machine and select "Amstrad1640", 8Mhz, 640 KB, Triden VGA card/Paradise PVGA1A. It should use the Amstrad 1640 rom. In my case I see identical ROM as in the real machine.

Actually you should be able to use my .cfg file that I provided earlier and put the image in the same folder as the 86box executable.

toncho11 commented 2 weeks ago

Select machine type 8086 and then Amstrad 1640.

toncho11 commented 2 weeks ago

Select XTIDE as hard drive. And select existing drive from "Hard Disks".

toncho11 commented 2 weeks ago

I use the same image and it is as you reported. Now download this image from: hd32-fat.img hd32-fat.img

Change the two lines. Use "shutdown -r".

Ant then it does the panic.

toncho11 commented 2 weeks ago

After a hard reset ... the problem is as you say (no panic). So it must be the shutdown command????

ghaerr commented 2 weeks ago

Change the two lines. Use "shutdown -r". Ant then it does the panic.

Ok, I got the panic, finally.

After a hard reset ... the problem is as you say (no panic).

Yes, upon rebooting the same image just modified above, no panic.

So it must be the shutdown command????

Not sure yet. I'm thinking it might be something to do with the kernel exit procedure when sash n exits with an error, but still works after cold boot. I'll look into this further.

Does your real hardware always panic, or can you get it to give the "n: no such file or directory" ever?

toncho11 commented 2 weeks ago

Far fetched but: maybe shutdown sends park heads which does something that survives the soft reboot. Maybe something that stays in the XTIDE during a soft reboot.

toncho11 commented 2 weeks ago

Also shutdown should be doing the sync, right? I am not doing it because I have assumed that it does.

toncho11 commented 2 weeks ago

This time I did sync and ctrl+alt+del. I got panic. Power on/off does render the computer in the "No such file" state. It looks permanent. Tested on real Amstrad.

ghaerr commented 2 weeks ago

I can only get this to fail using your hd32-fat.img. Did you build that yourself? Which commit is it from? When I build the latest and uncomment the two lines then shutdown -r, it doesn't fail with panic!

If you built it yourself, please post your .config unless you're using the standard one ibmpc-1440-nc.config, built for 360k with "make images".

ghaerr commented 2 weeks ago

I am thinking the reason this is happening has something to do with the /bootopts file: on a second (warm) boot, the memory segment that /bootopts is loaded into is NOT zeroed, but instead the new copy of /bootopts is read on top of the old one. If the /bootopts were shorter (which it is with the removal of two '#' chars), there is a possibility for some garbage at the end, particularly the last two chars of the previous /bootopts; this could be causing a crash. This is a possibility since it appears that /bin/sash never runs, which is why we don't see the "no such file" message.

In order to test this theory, after uncommenting the two lines, go to the end of the /bootopts and add one more line commented out with whatever you want in it - this would make the /bootopts file larger than before. If this changes the outcome, then we are on to something.

ghaerr commented 2 weeks ago

In order to test this theory

Tested - and no change, still panic. So I can still only get this to fail using your hd32-fat.img. When I create hd32-fat.img using "make images" using the ibmpc-1440-nc.config, I can't get it to panic.

This isn't a high priority bug, since /bin/sash n will always fail to boot anyways, but it'd be nice to know why its happening. We need to find a way to create a failure from a standard build, or a known commit.

toncho11 commented 2 weeks ago

I am using the hd32-fat image from your fdisk PULL request https://github.com/ghaerr/elks/actions/runs/11219801823

I thought that a soft boot still clears the memory.

toncho11 commented 2 weeks ago

The general understanding is that softboot clear the memory. No doubt about it. Some initialization steps of the hardware might be skipped though.

ghaerr commented 2 weeks ago

@toncho11:

I am using the hd32-fat image from your fdisk PULL request https://github.com/ghaerr/elks/actions/runs/11219801823

That's the latest version - very strange: when I build that version using ibmpc-1440-nc.config, which is the same as used for the automated CI build, I can't get the image to fail! There's something going on here which we still haven't uncovered...

ghaerr commented 2 weeks ago

@toncho11,

After playing around several hours (far?) too long with this, I am thinking the issue could well be a bug in the Amstrad PC 1640 BIOS. This is because, for instance, when I change the machine ROMs from Amstrad 1640 to Compaq DeskPro, use your hd32-fat.img and make the same changes you've made, the emulation works with Compaq DeskPro BIOS, whereas the emulation using Amstrad BIOS fails.

Since no other emulator actually uses the Amstrad ROM BIOS, this could very well be the reason for the real bug - the Amstrad BIOS may be buggy after a warm start, and cause a Divide Error. Note that only in recent version of ELKS did we add the Divide Error trap, for which previously no error was reported by the BIOS nor ELKS. Thus previously, this could have been happening on warm starts and there would never be an error shown.

So, unless we can come up with another emulator that shows this error, or you can show it on a non-Amstrad BIOS being emulated by 86Box, I will stop work on this, since it takes quite a bit of time 2-3 minutes for each boot run (and still can't reproduce from my build, for yet another unknown reason).

Thank you!

toncho11 commented 2 weeks ago

Thank you! I was thinking in the beginning that it might be linked to a bigger issue like a new divide code or something, but now I agree with you.

Thank you!!!

Enjoy your trip! :)

ghaerr commented 2 weeks ago

@toncho11,

that it might be linked to a bigger issue like a new divide code

The panic is brought about because ELKS now traps HW INT 0 which is produced on divide by zero or a divide overflow. Thus, the likely Amstrad BIOS bug is exposed and system operation is discontinued since a numeric result is incorrect (somewhere, likely in BIOS ASM code). There are some who think that a system panic might not be the best option, however.

For the 8086 ad 8088 CPUs only, which includes your Amstrad 1640, on divide fault the chip pushes the address past the DIV instruction, allowing for a simple IRET instruction to resume operation. Most BIOSes fill all 256 interrupt vectors with an IRET, and that's likely what the Amstrad BIOS does, so you'd never see anything (and neither did Amstrad). This CPU behavior was changed in the 80186 and up to push the beginning of the DIV instruction, and a simple IRET would end up in a silent busy loop.

For your hardware, if you decide that you don't want to see a panic on (some strangely configured?) reboots, you can disable the ELKS divide fault handler by commenting out the following line in elks/arch/i86/kernel/irq.c:

    int_handler_add(IDX_DIVZERO, 0x00, div0_handler_panic);

This was also debated at the time of adding the handler, but ultimately it was thought best to show an error rather than continue without knowing.