SvarDOS / edrdos

Enhanced DR-DOS kernel and command interpreter ported to JWasm and OpenWatcom C
http://svardos.org/
Other
38 stars 4 forks source link

Kernel fails to boot on 86box IBM-PC (1982), works IBM-XT onwards #102

Closed mateuszviste closed 2 months ago

mateuszviste commented 2 months ago

Tested under 86Box with an emulated 8086 PC. When booting from a 720K floppy disk, the kernel displays a warning as shown on the screenshot, and the boot stops (freezes).

Monitor_1_20240821-230828-103103

same story with a different IDE controller:

Monitor_1_20240821-231409-886886

VM configuration:

image

mateuszviste commented 2 months ago

it works when switching the VM to an IBM PC XT model. Everything else is the same (same 720K floppy, same RAM amount, same HDD controller...).

boeckmann commented 2 months ago

Yes can confirm. This is also for the 360k floppy image.

boeckmann commented 2 months ago

@ecm-pushbx is there a change getting lDebug to run on a 256k machine? I made an instsect.com on a 360K drive, copied ldebug onto it. It bails out with an "out of memory" message. Screenshot and a 360k floppy image attached with uncompressed single-file edr protocol kernel named drbio.sys (because this one does not relocate).

Bildschirmfoto 2024-08-22 um 17 53 18

ldeb360.img.zip

ecm-pushbx commented 2 months ago

I built a special-purpose binary which works with as little as 192 KiB of memory. (Still fails if only 128 KiB are available.) It's in https://pushbx.org/ecm/test/20240822/

Here's the history of a test I did in qemu (with an EBDA of size 1 KiB):

r v0 = word [0:413]
r word [0:413] = #192
m v0<<6:0 l #1024 #192<<6:0
dw 0:40E l 2
h v0<<6
r word [0:40E] = #192<<6
dw 0:40E l 2
boot protocol ldos smalll.sys
g

Here's the SmallL executable showing its own resident size, again with a 1 KiB EBDA.

 -h word [word [0:413]+1<<6:8]
 1580  decimal: 5504
 -h as paras word [word [0:413]+1<<6:8]
 00015800   86.0 KiB
 -

Same but independent of EBDA, using the DPR variable:

 -h as paras word [dpr-1:8]
 00015800   86.0 KiB
 -

You can use your existing installer as in instsect.com A: /F1=smalll.sys as the boot loading appears to work.

Here's how I created the build:

  1. ./makec (to compile mktables)
  2. ./mktables 8086 (build tables without 186+ level instructions)
  3. INICOMP_METHOD=none ./mak.sh -D_EXTENSIONS=0 -D_BOOT_ENV_SIZE=256 -D_REGSHIGHLIGHT=0 -D_GETLINEHIGHLIGHT=0 -D_REGSLINEBREAK=0 -D_REGSREADABLEFLAGS=0 -D_MS_N_COMPAT=0 -D_MS_0RANGE_COMPAT=0 -D_MS_PROMPT_COMPAT=0 -D_MS_MNEMON_COMPAT=0 -D_VXCHG=0 -D_ALTVID=0 -D_INDOS_PROMPT=0 -D_MCB=0 -D_MMXSUPP=0 -D_DT=0 -D_DSTRINGS=0 -D_DTOP=0 -D_CLEAR=0 -D_RSEPARATE=0 -D_VDD=0 -D_EXTHELP=0 -D_APPLICATION=0 -D_DEVICE=0 -D_RE_BUFFER_SIZE=256 -D_NUM_B_BP=4 -D_NUM_G_BP=4 -D_NUM_B_WHEN_BYTES=256 -D_DELAY_BEFORE_BP=0 -D_HISTORY_SEPARATE_FIXED=0 -D_40COLUMNS=0 -D_DETECT95LX=0 -D_DOSEMU=0 -D_DISASM_32BIT=0 -D_ONLYNON386=1 -D_AREAS=0
ecm-pushbx commented 2 months ago

This is the smallest it can get easily. But it disables some parts you may want to use such as support for Extensions for lDebug.

ecm-pushbx commented 2 months ago

The mak script creates a binary named ldebugu.com but despite the name this is built with -D_APPLICATION=0 -D_DEVICE=0 so it can only be loaded in bootloaded mode. From your description I gather that's what you want.

boeckmann commented 2 months ago

Thanks :) I will try this probably at weekend, when my mind has the capacity to deal with this :) I try to load the EDR kernel under lDebug, with only 256K available. This might not fit (have not figured out yet the additional amount of memory the kernel requires apart from itself). Thinking about it, I better try the uncompressed dual-file drbio.sys, as the drdos.sys part does not seem to be involved in the trouble going on when booted under the IBM-PC vm type.

boeckmann commented 2 months ago

https://github.com/SvarDOS/edrdos/blob/525b418819b1c775afd040f28c0254fb527646ae/drbio/config.equ#L38

256K - 86k(ldebug) - 96k=74k

Hopefully sufficient that there is not some self-overwriting going on when the kernel relocates (dual-file drbio.sys may work)...

ecm-pushbx commented 2 months ago

Be sure to use boot protocol edrdos . // if you don't have a drdos.sys file, or boot protocol freedos segment=70 drbio.sys

boeckmann commented 2 months ago

That looks very promising. I can boot EDR on an IBM-XT vm under lDebug with 256k RAM (dual-file, uncompressed kernel).

But guess what: the kernel also boots perfectly fine on an IBM-PC vm under lDebug. Another case where the observed object behaves different under observation :/

boeckmann commented 2 months ago

@ecm-pushbx if I boot the kernel without setting the carry flag on int 3, so that the kernel intercept interrupts 0, 1 and 3 by itself, is there still anything lDebug does in the background, or is it simply a "space waster" after the kernel being booted?

If it does nothing anymore, the problem may be in the handover chain from BIOS -> bootloader -> kernel, with some value not expected by the kernel...

boeckmann commented 2 months ago

Addition to the previous post: under the assumption that no breakpoints etc. are set by the user...

ecm-pushbx commented 2 months ago

@ecm-pushbx if I boot the kernel without setting the carry flag on int 3, so that the kernel intercept interrupts 0, 1 and 3 by itself, is there still anything lDebug does in the background, or is it simply a "space waster" after the kernel being booted?

Pretty much, yes. It will still respond to int 18h, int 19h, or int 06h (not on HP 95LX) if these are either not hooked or restored at a later point.

If it does nothing anymore, the problem may be in the handover chain from BIOS -> bootloader -> kernel, with some value not expected by the kernel...

You can prepare a boot sector file to chainload, eg using instsect as in instsect A: /B=bootsect.dos /BO to save the current boot sector loader into a file. Then in the booted lDebug run boot protocol chain and you can trace the original boot loader. This may help pin down a problem between the loader and the kernel, or it may not.

If it doesn't I'd suggest you write a small bootloader that just displays a hexdump of itself (including BPB) and all register and flags values at its entry, so that you can recreate the same when tracing with lDebug.

ecm-pushbx commented 2 months ago

If it makes any difference (possible!) you can boot lDebug (off fdb or hda1, or off fda then switch the diskette image loaded into fda) and run boot fda (same as boot protocol sector fda) which will have lDebug imitate what it assumes the ROM-BIOS would do to load the boot sector, including to read the sector off the actual unit.

ecm-pushbx commented 2 months ago

If it doesn't I'd suggest you write a small bootloader that just displays a hexdump of itself (including BPB) and all register and flags values at its entry, so that you can recreate the same when tracing with lDebug.

Added in https://pushbx.org/ecm/test/20240823/

This is a small test loader that can be installed using instsect A: /S12=test12.bin (only the informational FAT type string is FAT12-specific). It was built using nasm -I ../lmacros/ -D_FAT12 testboot.asm -o test12.bin from the latest revision of https://hg.pushbx.org/ecm/ldosboot.exp/rev/4123bf41df1f

It starts out displaying register values, then will prompt a few times for a keypress to display more of the boot sector hexdump. After the last data from the sector has been displayed it will wait for another keypress and then run an int 19h.

boeckmann commented 2 months ago

Has lDebug some heuristics on when to skip showing instructions? It sometimes "misses" to display instructions (albeit they are executed). For example, in the following screenshot there is a LEA instruction missing setting DI to 100h. It is executed but not displayed. But to my understanding it should (I entered a "p" and then simply return multiple times to step thorough the following instructions)... See also the increased IP.

Layer 8 error?

Bildschirmfoto 2024-08-23 um 14 39 53

Another one. Missing a pop ds and a mov si,... :

Bildschirmfoto 2024-08-23 um 14 43 40
ecm-pushbx commented 2 months ago

Try running r dao or= 80 to set the Debugger Assembler option: "80 Disassembler: NEC V20 repeat rules (for segregs)". This will make the debugger repeat disassembly on T/TP/P/G/R register dumps if the first instruction writes to any segreg. So it will still execute them in a single trace/proceed step but the debugger will disassemble the following instruction.

It seems likely that this is the cause of your problem as in all three cases the prior instruction was a pop es or pop ds. The next instruction being run immediately on a single trace step is an effect of the interrupt lockout, which is only really needed for writing ss so that another instruction immediately after which writes sp will always be run together with the write to ss (as the 8088/8086/186/286 didn't have lss sp). However, the NEC V20/V30 and the VM you use both seem to apply the lockout to any mov/pop to a segment register.

ecm-pushbx commented 2 months ago

For context of what instructions cause the repetition with DAO 80:

boeckmann commented 2 months ago

Interestingly the bootsector provided by FreeDOS SYS fails with message ".Error!" if chainloaded via boot protocol chain. It works better if booted via boot fda.

boeckmann commented 2 months ago

Kernel boots fine via boot protocol edrdos even if I match the register values to these provided by the FreeDOS bootloader upon kernel start.

boeckmann commented 2 months ago

Kernel boots also fine if the FreeDOS bootloader is executed via lDebug boot fda.

ecm-pushbx commented 2 months ago

Interestingly the bootsector provided by FreeDOS SYS fails with message ".error!" if chainloaded via boot protocol chain. It works better if booted via boot fda.

Perhaps sector reads can sometimes fail? That'd at least explain why using the boot protocol edrdos command always seems to work, and would also explain the FreeDOS loader erroring out.

ecm-pushbx commented 2 months ago

Interestingly the bootsector provided by FreeDOS SYS fails with message ".Error!" if chainloaded via boot protocol chain. It works better if booted via boot fda.

Can you provide a diskette image with all needed files and list the 86box settings to reproduce? I may try to debug this on the desktop at home, I think I have 86box on there (running on an amd64 Debian Linux host).

boeckmann commented 2 months ago

Sure. I attach two images (zipped) to this post. The first one sysbs.img is the FreeDOS bootloader installed into sector 0, which at least manages to boot into the kernel. The second image is the one chainld.img with lDebug being booted and the bootsector from image 1 included as bootsect.dos in the root dir. This fails if I load it with boot protocol chain bootsect.dos, outputting the following. See @mateuszviste first post for the machine configuration (screenshot). Note: I left the speed at 4.77Mhz (did not make any difference).

Bildschirmfoto 2024-08-23 um 16 35 17

Here the INT13 fails. Register values look good though:

Bildschirmfoto 2024-08-23 um 16 55 02
boeckmann commented 2 months ago

The exact same chainld.img works with pcjs at https://www.pcjs.org/machines/pcx86/ibm/5150/cga/

Bildschirmfoto 2024-08-23 um 17 04 16

However, booting the kernel still crashes (in another way) when booted directly via FreeDOS bootsect.

boeckmann commented 2 months ago

And kernel boots under the pcjs XT type https://www.pcjs.org/machines/pcx86/ibm/5160/cga/ with the FreeDOS loader.

ecm-pushbx commented 2 months ago

Just as a hint, you can boot protocol chain without the filename which will default to bootsect.dos

boeckmann commented 2 months ago

I found something out: the int 3 in init.asm triggeres this: https://github.com/SvarDOS/edrdos/blob/525b418819b1c775afd040f28c0254fb527646ae/drbio/init.asm#L978

If I comment it out it seems to work as expected. This also explains why it works under the debugger. It works now in 86box and with pcjs.

I now will have a look at why exactly this fails.

boeckmann commented 2 months ago

Looks like the INT 3 vector contains no sensible value on the original IBM-PC. Under PCjs it is set to 0:0. Can not confirm yet that this is also true for 86box, as it behaves differently (both work with the INT 3 commented out). Is it possible to extend your test12 test programm to also dump the first bytes of the IVT?

boeckmann commented 2 months ago

I am questioning myself if the INT 3 call should better be guarded by a DEBUG build flag, so disabled for normal builds.

ecm-pushbx commented 2 months ago

Looks like the INT 3 vector contains no sensible value on the original IBM-PC. Under PCjs it is set to 0:0. Can not confirm yet that this is also true for 86box, as it behaves differently (both work with the INT 3 commented out).

Good catch!

Is it possible to extend your test12 test programm to also dump the first bytes of the IVT?

Yes, I updated it to dump the first 32 IVT entries (int 00h to 1Fh) in segment:offset format in https://hg.pushbx.org/ecm/ldosboot.exp/file/1c89531daf16/testboot.asm I also updated the binary in https://pushbx.org/ecm/test/20240823/

I am questioning myself if the INT 3 call should better be guarded by a DEBUG build flag, so disabled for normal builds.

This is what we did in FreeDOS: https://github.com/FDOS/kernel/blob/1cc00e194dd969d30c78775c67a1df44af307abf/kernel/kernel.asm#L80 The check debugger option is by default disabled, skipping the int3 breakpoint in the init. EDR-DOS doesn't have a CONFIG or lCFG or comparable patchable block yet so unclear what to do. Add one? Or just disable the check at build time? Or validate the int 3 handler address by default before calling it, ie checking that it isn't segment = 0, isn't offset = FFFFh, and points at a linear adress >= top of LMA (taking into account int 12h or RPL reserved area) and < 10_0000h.

ecm-pushbx commented 2 months ago

Additional possible check, lDebug's int 3 handler always uses a standard IISP header: https://hg.pushbx.org/ecm/ldebug/file/9316c0cfe06a/source/run.asm#l6098 Testing for this would tie the check closer to lDebug but would make it more resilient against false positives.

ecm-pushbx commented 2 months ago

I just updated patchpro to allow it to recognise lCFG blocks on any dword boundary within a file's first 8 KiB, rather than only paragraph boundaries.

This could be useful to place an lCFG block (currently 32 bytes in size) near the beginning of SvarDOS flavoured kernels. I'm considering to put it in place of two device driver headers of 18 bytes each to have it stay in the initial part of the kernel file. The kernel would then later in its init overwrite the lCFG block with the device headers copied from a temporary location.

lDOS flavoured kernels can store an lCFG block either in the uncompressed header of inicomp (already implemented) or in drkernpl's beginning (not yet). It can be passed to the kernel on the stack (also not yet).

ecm-pushbx commented 2 months ago

For now I'd suggest to just go with the build option but we can revisit this at a later time.

boeckmann commented 2 months ago

Yes is build option is the safest option for the moment. Then I have time to figure out what a lCFG block is, and what patchini is for :)

ecm-pushbx commented 2 months ago

=) lCFG block is my alternative to the FreeDOS CONFIG block. FreeDOS CONFIG must be at offset 0 in the file for now, which I didn't like. In designing the lCFG block I also took care to add a bitmap of supported bytes. Like the CONFIG block each configuration item is "identified" only by its position in the block. Unlike the CONFIG block, lCFG's bitmap means the kernel can advertise support for every single byte (up to 64 bytes) individually. For FreeDOS, the CONFIG block only stores a single "length" of how many bytes are supported, so you can't really advertise that an earlier positioned byte isn't supported by a particular kernel.

lCFG blocks are only used by lDOS inicomp so far, and 3 bytes are used. Each byte has the same meaning, one each for application mode, device mode, and bootloaded operation. The byte value indicates what style of depack progress display to use.

ecm-pushbx commented 2 months ago

patchpro is the canonical example of accessing the lCFG block, so as to display or set the progress display variant for lDOS inicomp. The other tool in patchini, patchqry, is unrelated to lCFG blocks but rather deals with the query patch to patch the behaviour of lDOS iniload / a loaded kernel in bootloaded mode.

boeckmann commented 2 months ago

Yes, I updated it to dump the first 32 IVT entries (int 00h to 1Fh) in segment:offset format in https://hg.pushbx.org/ecm/ldosboot.exp/file/1c89531daf16/testboot.asm I also updated the binary in https://pushbx.org/ecm/test/20240823/

Nice! This is really helpful to get an overview of the system state right after boot!

boeckmann commented 2 months ago

There is also the kernflg which could be used to enable these kind of things: https://github.com/SvarDOS/edrdos/blob/525b418819b1c775afd040f28c0254fb527646ae/drbio/init.asm#L203-L208

There are five bits left. Could be used as a middle way between the build time flag and your more sophisticated solution.

boeckmann commented 2 months ago

For the time being, I decided in favour of the kernflg solution. Byte 5, bit 2 has to be set to enable debugger interception. A little tool would come handy. But instead of implementing this a more general solution like @ecm-pushbx proposed is desireable, because currently config space is one byte... Not that urgent right now...

However, the bug causing this issue should be gone. So closing this.

ecm-pushbx commented 2 months ago

Looks like the INT 3 vector contains no sensible value on the original IBM-PC. Under PCjs it is set to 0:0. Can not confirm yet that this is also true for 86box, as it behaves differently (both work with the INT 3 commented out).

Is the vector 0:0 for the affected 86box machines too? I will prepare a patch that allows to set a "check only if vector appears valid" flag in the check debugger byte of the lCFG block.

ecm-pushbx commented 2 months ago

Is the vector 0:0 for the affected 86box machines too? I will prepare a patch that allows to set a "check only if vector appears valid" flag in the check debugger byte of the lCFG block.

Added in https://hg.pushbx.org/ecm/edrdos/rev/1e453d972df2