Baron-von-Riedesel / HX

Home of the HX DOS Extender and its included DPMI-host HDPMI.
190 stars 13 forks source link

once-off exception #14

Open kerravon86 opened 3 years ago

kerravon86 commented 3 years ago

Hi Japheth. It's good to find you again (we exchanged email on 2009-03-30 and 2006-11-19).

I have a problem with HX 2.18 which also occurred in HX 2.17. Under Freedos, not sure how to tell what version I am using (I see the number 0.84-pre2 being printed below), the first program I run that uses HX crashes as below, and from then on, everything is fine. Any idea what is happening? You can see I am running gccwin.exe which is a 3 MB executable (my own version of GCC 3.2.3, built with PDPCLIB). It is a standard Win32 PE executable dependent only on kernel32.dll.

Thanks. Paul.

FreeCom version 0.84-pre2 XMS_Swap [Aug 28 2006 00:29:00] C:>prompt $p$g C:>path c:\;c:\dos;c:\winpath;c:\path;c:\emx\bin;c:\tc\bin;c:\borland\bin;c:\i1 6gnu\bin;c:\hx\bin;c:\watcom\binnt C:>rem mscdex /d:cdrom C:>doskey DOSKEY features are already enabled in the shell. C:>hxldr32 HXLdr32 V1.13 installed. Win32 console apps may possibly run now in DOS C:>rem set C_INCLUDE_PATH=c:/emx/include C:>rem set LIBRARY_PATH=c:/emx/lib C:>rem set CPLUS_INCLUDE_PATH=c:/emx/include/cpp;c:/emx/include C:>cd \fixwin C:\FIXWIN>call fixwin C:\FIXWIN>gccwin -S temp.c

dkrnl32: exception C0000092, flags=0 occured at BF:4B9E31 ax=400000 bx=0 cx=33B7A0 dx=0 si=797620 di=76C640 bp=326898 sp=326868 ip = Module 'gccwin.exe'+B9E31 [eip] = DF 2C 24 DC 0D 08 9E 4B 00 DA E9 DF [esp] = 00400000 00000000 00797620 007982C0 00326898 004A75F9 dkrnl32: fatal exit! C:\FIXWIN>cd \devel\pdos\src C:\DEVEL\PDOS\SRC>

C:\DEVEL\PDOS\SRC>cd \fixwin C:\FIXWIN>fixwin C:\FIXWIN>gccwin -S temp.c C:\FIXWIN>

Baron-von-Riedesel commented 3 years ago

hello Paul,

exception 0xC0000092 is STATUS_FLOAT_STACK_CHECK. HDPMI does NOT initialize the FPU, so if a program - running BEFORE hdpmi is loaded - has left the FPU in an "inconsistent" state, such an error may happen. The exception handler in dkrnl32.dll does a FNINIT on floating point exceptions, so the error disappears after the first exception.

kerravon86 commented 3 years ago

exception 0xC0000092 is STATUS_FLOAT_STACK_CHECK. HDPMI does NOT initialize the FPU, so if a program - running BEFORE hdpmi is loaded - has left the FPU in an "inconsistent" state, such an error may happen. The exception handler in dkrnl32.dll does a FNINIT on floating point exceptions, so the error disappears after the first exception.

Hi Japheth.

Thanks a lot for your reply and an explanation.

As far as I can tell, my situation is that I am not running any floating point application at all, prior to running hxldr32 and a 32-bit application.

I even got rid of "doskey" and still got the exception.

I tried running a 16-bit MSDOS executable that does floating point, prior to running 32-bit gccwin.exe and that made the exception go away.

So what I'd like to know is what the rules are. I assume an 80386 processor is allowed to start with the FP not initialized? In which case is it the OS (Freedos in this case) who is responsible for doing an FNINIT prior to running any applications?

Or is every application that uses floating point required to assume that the FPU is uninitialized and thus needs to do an FNINIT?

I didn't think "gccwin --version" actually did any floating point, but I tried running a different Windows application (with the same C library), and it didn't have a problem, so it seems that it may do some floating point which is why it is the thing that is finding the problem. I then tried another program that only did floating point if particular options were used, and I was able to see that if it did floating point, it got the exception, otherwise it worked fine.

I then ran that same program under PDOS/386 (PD-Windows) and the first run it produced an incorrect result (there could have been an exception that was ignored by PDOS), and then the next run worked correctly.

So when an 80386 PC (in my case, Bochs) boots, is there a system requirement that someone do an FNINIT prior to the first floating point instruction, or is it expected to power up in a "ready to use" state?

If someone needs to do an FNINIT, should it be the OS or is every application responsible for doing their own?

I also discovered that if I run a 16-bit MSDOS program without having done something that invokes the FNINIT, then when I do some floating point it actually hangs under Freedos.

I would thus assume that every application is required to do their own FNINIT, and in practice that means it is the job of the C runtime library to do it on startup. Is that correct?

Thanks. Paul.

Baron-von-Riedesel commented 3 years ago

Or is every application that uses floating point required to assume that the FPU is uninitialized and thus needs to do an FNINIT?

Actually, that would be the safest strategy - but obviously, many apps don't do that. DOS simply doesn't care about the FPU.

In the Jemm package, there's a tool CPUSTAT. Running it should tell you if the fpu is in a "correct" state ( FCW=37F, FSW=0). It also displays the OSXMMEX flag ( value of 1 meaning "FP exceptions for SIMD instructions unmasked" ), but in your case it is pretty obvious IMO that it's indeed the old FPU that causes the trouble.

Baron-von-Riedesel commented 3 years ago

I also discovered that if I run a 16-bit MSDOS program without having done something that invokes the FNINIT, then when I do some floating point it actually hangs under Freedos.

Yes, that may happen. The behavior depends on the status of the CR0.NE flag. You might get various kind of errors ( hangs, "parity error" [because INT 02 is invoked by the BIOS] ) or nothing at all.

In hx v2.19 I changed the behavior of hdpmi - it no longer touches the CR0.NE flag. Previous versions always set this flag to 1, but this isn't 100% compatible with all DOS extenders.

kerravon86 commented 3 years ago

Hi Japheth.

Or is every application that uses floating point required to assume that the FPU is uninitialized and thus needs to do an FNINIT? Actually, that would be the safest strategy - but obviously, many apps don't do that. DOS simply doesn't care about the FPU.

I have been told that the following is required to use floating point on an 80386:

a) CR0 bits may need to be set appropriately for the coprocessor type, such as CR0.EM, CR0.ET, CR0.MP, and CR0.NE. These bits select hardware vs software, internal vs external, 287/387/487, and whether to call Int 0x10 and Int 0x07, etc. CR0 is probably 286 PM and later. I.e., probably need to program them all for hardware and internal coprocessor, except your 8086 would be external? You'd also need to select 287, 387, 487 correctly.

b) I/O ports 0xF0 and 0xF1 need to be written with zero to clear the "coprocessor busy latch" and reset the coprocessor, respectively

c) FNINIT

d) possibly need FSTSW FCLEX sequence and FNSTCW instruction too

Writing to ports like 0xF0 is way beyond the scope of an application programming executing floating point instructions. Shouldn’t HX be doing the above already, and thus wouldn’t it make sense to do an FNINIT to put it into a known state for first time use? ie is there any advantage to leaving it in an unusable state? Thanks. Paul.

Baron-von-Riedesel commented 3 years ago

Shouldn’t HX be doing the above already, and thus

I vaguely remember that I indeed added such an initialization, but (obviously) reverted this later. There are situations where this init is undesirable; after all, if DOS doesn't init the FPU, it's also not the job of a DOS extender to do so.

kerravon86 commented 3 years ago

Shouldn’t HX be doing the above already, and thus

I vaguely remember that I indeed added such an initialization, but (obviously) reverted this later. There are situations where this init is undesirable;

Perhaps it could be an option to hxldr32? I’d suggest by default do the initialization so that people are not surprised.

after all, if DOS doesn't init the FPU, it's also not the job of a DOS extender to do so. That is certainly one way of looking at it. However, I believe HX is way beyond a DOS extender. It is a replacement for Windows. I can run my Windows programs unchanged. You have produced something fantastic. By the way, you are missing a MSVCRT.DLL, but I can supply that – it is part of PDPCLIB. Also by the way, maybe you can assess having HX run on PDOS/86 and PDOS/386 which in both cases will give you LFN support, and provide two more Windows environments. hxldr32 loads on PDOS/86 without error but there must be something missing in PDOS/86 that prevents Windows programs from subsequently running. To work on PDOS/386 will require changes to HX to use the 32-bit interface. I’m assuming some of the DLLs will need no change at all though. My MSVCRT.DLL will have no changes too – it already works on both PDOS/386 and HX under Freedos. Anyway, if you look at HX as Windows rather than DOS, I just confirmed that if I open a Windows command prompt and run a floating point program that doesn’t do an FNINIT, it works fine. Windows provides a clean environment to the command prompt. I believe that HX should be doing the same, preferably by default (so that you don’t get “bug” reports like mine ever again), but at least as a command line switch so that you can reply “RTFM” to all the people submitting bug reports. I don’t think it makes a difference whether it is 1 person submitting a bug report or 1 million – it is the principle of who should be doing the required manipulation of I/O ports to get the 80387 coprocessor ready – who else is going to do this? There’s no-one else. HX is the only thing that can know about I/O ports on an IBM PC, just as real Windows apparently does.

Ringdingcoder commented 3 years ago

Are you talking about an actual 80386 (with 80387, presumably) or about any generic x86 machine ≥ Pentium Pro? Do the current ones still need outs to 0xf0/0xf1? That would be rather surprising to me.

kerravon86 commented 3 years ago

Are you talking about an actual 80386 (with 80387, presumably) or about any generic x86 machine ≥ Pentium Pro? Do the current ones still need outs to 0xf0/0xf1? That would be rather surprising to me.

That is just information I was given. I don’t know how accurate it is or which processors it applies to. However, I do expect that HX should run on all of 80386 and above, detecting the exact processor if required, and doing all hardware manipulations required by each individual processor and coprocessor. Windows presumably does exactly the same thing, and it is HX that replaces Windows, not Freedos. There’s no-one else to do this except HX. Or if Japheth doesn’t want to see this in HX code for whatever reason, we should write and document some other program that activates the coprocessor so that Windows applications are presented with the same thing that Windows provides. HX is not a mere DOS extender. It is an OEM Windows. It is an important piece of software. God knows how much effort went into creating an OEM Windows. No reason to let it fall short when it comes to floating point. Or basic MSVCRT functionality so that some Mingw-like programs can run. HX may have had limited audience because it is dependent on Freedos limitations, such as lack of LFN and I’m not sure if some people care about real mode being used internally. Those restrictions have been (or are being) lifted with PDOS replacing Freedos. It doesn’t need to be Japheth personally who does any of this. But it would be good to have some agreement on proper system design. And if HX is made public domain (as PDOS is, and as other work of Japheth is) when the clarified license is released, it will open the door to unrestricted commercial Windows competitors. I see a major milestone within reach. As far as I can tell, almost all the code has already been written. It just needs a bit of glue.

Baron-von-Riedesel commented 3 years ago

Do the current ones still need outs to 0xf0/0xf1?

Yes, it's still common to write to this port in IRQ 0Dh (int 75h). HDPMI's default interrupt code does it as well.

we should write and document some other program that activates the coprocessor

It's not "activate", just "init". And that's a trivial program, you can write it with DEBUG.COM, it's just three bytes long:

DB E3 fninit C9 ret

kerravon86 commented 3 years ago

Hi Japheth. Please bear with me, I’m still trying to understand the technical issue.

It's not "activate", just "init". And that's a trivial program, you can write it with DEBUG.COM, it's just three bytes long:

DB E3 fninit C9 ret

Ok, so the missing functionality is just 2 or 3 bytes. The question is – where should this 2/3 bytes go?

One option would be to run an fninit.com 3-byte program prior to running hxldr32.

Another option would be to run it after.

Another option would be to make it a Windows executable instead, so it’s no longer 3 bytes, but it is designed to run in protected mode, which may be the only place we care about, and this executable can be run not just in an HX environment but also on PDOS/386 which suffers from the same lack of initialization (currently).

Even leaving aside HX – PDOS/386 (aka PD-Windows) is meant to be a clone of Windows, and it appears to me that something in Windows does an FNINIT, ready for applications to use floating point.

Is there a reason why PDOS/386 should not do an FNINIT? ie does it cause an exception on an environment without a coprocessor, so it can’t be done unconditionally?

If it is done (on PDOS/386), should it only be done after writing to 0xf0/0xf1 and any other steps required to initialize the coprocessor?

I note you said it is “init” not “activate”. What is this distinction? Is writing to 0xf0/0xf1 considered to be activation or initialization or something else?

I assume you agree that applications shouldn’t be involved in writing to 0xf0/0xf1. And you also seem to agree that it is still an appropriate thing to be doing. So whose responsibility does it become then?

Surely it is PDOS/386 that is responsible?

And if it is appropriate for PDOS/386 to do it, what about PDOS/86 and Freedos? Those latter two are meant to be compatible with MSDOS, which presumably doesn’t do it either, so maybe we can leave it out of them, and I’m guessing that makes it a responsibility of every DOS executable to write to those ports or whatever is required to activate/init the coprocessor (if present) and switch to emulation if it doesn’t exist. Or maybe DOS executables normally always use emulation.

And then we’re back to HX. Given the answers for both PDOS/386 and PDOS/86 above, where do you think HX fits in? If it is appropriate for PDOS/386 to do an FNINIT, but NOT appropriate for PDOS/86 to do an FNINIT, then that would seem to mean that HX is responsible to fill the gap.

Otherwise Windows floating point executables will work on PDOS/386 but fail on PDOS/86 + HX. And that would seem to be a strange design decision to me, as I see them (and Windows 95 to 10), and Linux Wine, to all be Windows-compatible environments and I’d like to know what the lowest common denominator is and should be.

It appears that the LCD may be “any application that uses floating point is required to do an FNINIT”. But if HX is the only reason for that LCD, maybe the LCD can be lifted.

The big problem I face is that I have authored a C90 library, and I don’t do an FNINIT and I expect floating point to be “optional” and “just work”. I rarely write programs that use floating point. If I have to detect whether an executable is using floating point, and only then do I execute FNINIT, or perhaps I wait until a floating point instruction is executed, trap the signal, and retry it, it is a major change to the current simplicity of PDPCLIB. If I can just unconditionally do an FNINIT at startup, and it has no ramifications, that would be fine I guess. However, if it causes an exception, for which there is no handler in either PDPCLIB or PDOS/386, then that is a big problem to deal with. Especially when I’m running a program that doesn’t actually use any floating point.

However, maybe the correct technical solution is to deal with that “big problem”, and perhaps write a handler for IRQ 13 that prints out “sorry, coprocessor initialization has not been implemented in PDOS/386 yet, as writing to 0xf0/0xf1 etc is a big job for another day”, but then returns to the caller, and every one of my executables prints out that message and then runs successfully, because it is not using floating point.

And if that’s a solution for PDOS/386, maybe that should be a solution for HX too? Except for HX it would seem a different message is appropriate, maybe make the error message say “floating point exception occurred because you didn’t do an FNINIT, but I just did one for you now, so if you simply rerun the application, it will probably work now, why don’t you try it instead of submitting a bug report against HX? It’s designed that way, and I believe it to be correct design, so don’t ask for a change to HX - write your own 3-byte .COM program if you don’t want to see this message for the first run every time – the exact bytes you need are DB E3 C9”.

BFN. Paul.

Baron-von-Riedesel commented 3 years ago

but it is designed to run in protected mode, which may be the only place we care about

ok, but it doesn't matter at all in what mode fninit is executed.

Is there a reason why PDOS/386 should not do an FNINIT? ie does it cause an exception on an environment without

AFAIK no, you can "execute" FNINIT even if no FPU is present. The old DEBUG tool that I maintain does so as well.

I assume you agree that applications shouldn’t be involved in writing to 0xf0/0xf1.

AFAIK, writing to port 0xF0 is done by code handling interrupt request 0Dh, usually together with writing to port 0xA0 and 0x20 (EOI for PICs).

kerravon86 commented 3 years ago

I assume you agree that applications shouldn’t be involved in writing to 0xf0/0xf1.

AFAIK, writing to port 0xF0 is done by code handling interrupt request 0Dh, usually together with writing to port 0xA0 and 0x20 (EOI for PICs).

I have a bit more information about what the technical issue is. As far as I understand, an OS will deliberately put the FPU into an unusable state to deliberately cause an exception.

While ever there is no exception, floating point registers do not need to be preserved on task switches.

When an exception occurs, the OS handles it, does the FNINIT, and then retries the failing operation.

If an application (like a C runtime library) were to unconditionally do an FNINIT, this would bypass the OS optimization, so this should not be done.

HX is doing the trap, and the initialization, but it seems to be missing the “retry” step.

It is strange that a Windows executable should fail the first time and work the next. I think it should be consistent. Either permanently fail until someone does an FNINIT, or work every time.

If coding a “retry” in HX is too much work, that’s fine, I think the FNINIT should be removed from the trap, to make it consistently fail, perhaps printing out a message saying “you need to run fninit.com first”.

But the next step is to recognize that HX running under Freedos is a single-tasking anyway, and HX presumably doesn’t touch the floating point registers (and if it does, it probably saves them first), so HX (and/or Freedos) can do an FNINIT, as it is harmless, and saves the need to implement a trap. Both real mode trap and protected mode trap.

Given that my floating-point MSDOS executables are currently causing Freedos to hang, presumably due to an unhandled floating point exception, I need to start running fninit.com in autoexec.bat to make up for Freedos not doing “the right thing”.

If HX is being used to provide a pure Windows system, then it would probably be appropriate for HX to do an unconditional FNINIT at startup (so that implementing a trap is not required), because it is only HX that we care about, and hxldr32 will be executed unconditionally in autoexec.bat anyway. This is Windows, not MSDOS.

So my suggestion – put an FNINIT in hxldr32 (which will even fix the problem with people running MSDOS programs that use floating point) and remove the FNINIT from the trap handler, to get consistent failures in whatever strange circumstance caused the FPU to be in a bad state. Don’t add an FNINIT to the trap handler unless you’re also willing to code “retry” logic, and make the retry transparent to the end user (ie don’t print out any exception message as happens currently).

BFN. Paul.

kerravon86 commented 3 years ago

Maybe a simpler change that requires less work and is less controversial is just to print a message saying:

"FNINIT has just been done - please retry your command".

It's impossible to figure out what is happening at the moment. Nobody is going to think to retry the command to see if it now works. Software never works like that (fails, then suddenly works).

Baron-von-Riedesel commented 3 years ago

So my suggestion – put an FNINIT in hxldr32 (which will ... Maybe a simpler change that requires less work and is less controversial is just to print a message saying: ...

These suggestions are quite "hackish". If you want a clean fix, modify dkrnl32\except.asm, proc InitException:

if ?FLOATSUPP
        fninit                         ;<<<<< add this line
    mov ax,0E00h    ;this function will fail on NT platforms
    int 31h
    jnc @F
    int 11h         ;here FPU is bit 1
    shl al,1        ;so shift it to bit 2
@@:
kerravon86 commented 3 years ago

Ok, it's your product. The fix looks fine to me (as far as I understand it). Do you want me to organize a "pull request" in git to make that change?

kerravon86 commented 2 years ago

DB E3 fninit C9 ret

I tried this code out today and I got various crashes. Writing the assembler code and assembling with wasm suggests that it should be C3, not C9. Do you concur?

Thanks. Paul.