dmsc / emu2

Simple x86 and DOS emulator for the Linux terminal.
GNU General Public License v2.0
395 stars 30 forks source link

error, unimplemented opcode 66 at cs:ip = 0098:4FB3 #56

Open mkst opened 1 year ago

mkst commented 1 year ago

We're looking for a lightweight alternative to dosemu2 for running some old dos-based compilers, however instantly hit a hurdle with emu2:

$ ./emu2 /tmp/CC1PSX.EXE
./emu2: error, unimplemented opcode 66 at cs:ip = 0097:4FB3

I can see from the code that this is explicit behaviour:

    else if(inum == 0x06)
    {
        uint16_t ip = cpuGetStack(0);
        uint16_t cs = cpuGetStack(2);
        print_error("error, unimplemented opcode %02X at cs:ip = %04X:%04X\n",
                    memory[cpuGetAddress(cs, ip)], cs, ip);
    }

.. is this because it's a significant amount of work to implement? where would one even start if I wanted to try? or should I throw in the towel now :)

dmsc commented 1 year ago

Hi!

The opcode you found is from a 386 instruction - implementing the 386 instructions will be a significant effort, yes - it will be certainly easier to take an existing 386 CPU emulator and put it in place of the existing 286 one.

The 386 CPU added all the 32 bit instructions, the 32 bit protected-mode, paging, new exceptions, etc.

My original intention with EMU2 was to run old 16-bit DOS programs, those work in a 8088 or 8086 CPU, and don´t need newer opcodes.

Have Fun!

dmsc commented 1 year ago

Further explaining, the 66h is not an opcode, it is an opcode prefix, that changes the next instruction size from 16bit to 32bit, so to implement it you will need to implement all existing 16 bit instructions in a 32 bit version.

tkchia commented 1 year ago

Hello @dmsc,

For context, it seems there is a version of this CC1PSX.EXE at https://archive.org/details/psyq-sdk .

I am a bit surprised that the program is executing a 32-bit instruction, apparently (?) without checking beforehand that it is running on a 32-bit-capable platform (alternatively, there might have been a check that somehow went wrong).

Thank you!

mkst commented 1 year ago

There are a few versions of the compiler. We are looking at versions 3.5 and 3.6, which only appear to exist as 16bit binaries.

For a little more context, this is part of a matching decompilation project, where specific compiler versions are required to produce the correct result.

tkchia commented 1 year ago

Hello @mkst,

What are the last few (10 or so) instructions that are run before the unimplemented opcode 66? If you set a EMU2_DEBUG=cpu environment variable, then emu2 should save the sequence of (emulated) instructions into a file.

It is possible that CC1PSX.EXE actually requires 32-bit capability to run. I believe the archive.org version uses a 32-bit DOS extender and switches to 32-bit mode, even though it starts up in 16-bit mode.

Thank you!

mkst commented 1 year ago

Here's the last 20 commands (I've also attached the whole log file should that be of interest). It's all Greek to me!

$ tail -20 /tmp/CC1PSX.EXE-cpu.0.log
AX=0000 BX=0049 CX=0AFC DX=01E0 SP=FEC6 BP=FFD6 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=3BB2 NV UP EI PL ZR NA PE NC 0097:3BB2 7406             JZ      3BBA
AX=0000 BX=0049 CX=0AFC DX=01E0 SP=FEC6 BP=FFD6 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=3BBA NV UP EI PL ZR NA PE NC 0097:3BBA 837EF600         CMP     WORD PTR [BP-0A],00
AX=0000 BX=0049 CX=0AFC DX=01E0 SP=FEC6 BP=FFD6 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=3BBE NV UP EI PL ZR NA PE NC 0097:3BBE 752F             JNZ     3BEF
AX=0000 BX=0049 CX=0AFC DX=01E0 SP=FEC6 BP=FFD6 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=3BC0 NV UP EI PL ZR NA PE NC 0097:3BC0 8D46EE           LEA     AX,[BP-12]
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEC6 BP=FFD6 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=3BC3 NV UP EI PL ZR NA PE NC 0097:3BC3 50               PUSH    AX
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEC4 BP=FFD6 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=3BC4 NV UP EI PL ZR NA PE NC 0097:3BC4 E86F13           CALL    4F36
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEC2 BP=FFD6 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F36 NV UP EI PL ZR NA PE NC 0097:4F36 55               PUSH    BP
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEC0 BP=FFD6 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F37 NV UP EI PL ZR NA PE NC 0097:4F37 8BEC             MOV     BP,SP
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEC0 BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F39 NV UP EI PL ZR NA PE NC 0097:4F39 56               PUSH    SI
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEBE BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F3A NV UP EI PL ZR NA PE NC 0097:4F3A 57               PUSH    DI
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEBC BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F3B NV UP EI PL ZR NA PE NC 0097:4F3B 8C0EF00E         MOV     [0EF0],CS
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEBC BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F3F NV UP EI PL ZR NA PE NC 0097:4F3F 8C1EE80E         MOV     [0EE8],DS
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEBC BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F43 NV UP EI PL ZR NA PE NC 0097:4F43 8C16F40E         MOV     [0EF4],SS
AX=FFC4 BX=0049 CX=0AFC DX=01E0 SP=FEBC BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F47 NV UP EI PL ZR NA PE NC 0097:4F47 B88716           MOV     AX,1687
AX=1687 BX=0049 CX=0AFC DX=01E0 SP=FEBC BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F4A NV UP EI PL ZR NA PE NC 0097:4F4A CD2F             INT     2F
AX=1687 BX=0049 CX=0AFC DX=01E0 SP=FEB6 BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0000 IP=002F NV UP DI PL ZR NA PE NC 0000:002F ??               IRET    (EMU 2F)
AX=1687 BX=0049 CX=0AFC DX=01E0 SP=FEBC BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F4C NV UP EI PL ZR NA PE NC 0097:4F4C 23C0             AND     AX,AX
AX=1687 BX=0049 CX=0AFC DX=01E0 SP=FEBC BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4F4E NV UP EI PL NZ NA PE NC 0097:4F4E 7561             JNZ     4FB1
AX=1687 BX=0049 CX=0AFC DX=01E0 SP=FEBC BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4FB1 NV UP EI PL NZ NA PE NC 0097:4FB1 33C0             XOR     AX,AX
AX=0000 BX=0049 CX=0AFC DX=01E0 SP=FEBC BP=FEC0 SI=0000 DI=A917 DS=1079 ES=0040 SS=1079 CS=0097 IP=4FB3 NV UP EI PL ZR NA PE NC 0097:4FB3 66               DB      66

CC1PSX.EXE-cpu.0.log

dmsc commented 1 year ago

Hi!

From that log, the instruction at that address is "XOR EBP, EBP", this is the disassembly of the function end:

0097:211F   33C0    XOR     AX,AX
0097:2121   6633ED  XOR     EBP,EBP     <= 80386 INSTRUCTION
0097:2124   5F      POP     DI
0097:2125   5E      POP     SI
0097:2126   5D      POP     BP
0097:2127   C3      RET

So, the instruction does not have any side effects (it is followed by a POP BP), it only crashes on any processor without 32 bit support, so this is intentional.

Note that just above there is an INT 27h, this checks for any DOS extender loaded, if the function returns that the CPU does not implement 32 bit opcodes, it also jumps to the faulting instruction:

0097:20B5 B88716           MOV     AX,1687
0097:20B8 CD2F             INT     2F
0000:002F ??               IRET    (EMU 2F)
0097:20BA 23C0             AND     AX,AX
0097:20BC 7561             JNZ     211F
0097:20BE 90               NOP
0097:20BF 90               NOP
0097:20C0 F7C30100         TEST    BX,0001
0097:20C4 7459             JZ      211F
0097:211F 33C0             XOR     AX,AX
0097:2121 633ED            XOR     EBP,EBP

And finally, if the offending instruction is replaced with a NOP, this is the result:

~/psyq/psyq$ emu2 CC1PSX.EXE
CPU must be a 386 to run this program.
~/psyq/psyq$ 

Have Fun!

mkst commented 1 year ago

Does this suggest that the program could be patched to be treated as a 32bit app from the get go (and thus runnable via wine)? Or is this a fundamentally 16bit app with 32bit extension (and therefore 386 emulation is the only way to go)?

Thanks for your time & responses!

dmsc commented 1 year ago

You probably need a DOS extender and a 80386 emulator, I don´t know if it uses a DSO extender to work in protected-mode or simply to access extended memory. Try running it in DOSBOX with DPMI disabled.

tkchia commented 1 year ago

Hello @dmsc,

I tried the archive.org version of cc1psx.exe, and it did work under DOSBox (though this was not initially obvious — it seemed to hang, but really it was just waiting for input from the keyboard).

Thank you!

mkst commented 1 year ago

I should have been more clear, we can run the CC1PSX compiler via dosemu/dosbox BUT we are after a lighter solution. We are finding that the process seems to take 5 seconds to run a compilation command (still trying to determine what exactly is going wrong, it only takes 500ms when running it by hand) and I wondered if emu2 could be a lightweight alternative..

tkchia commented 1 year ago

@mkst : what exactly do you mean by running cc1psx.exe "by hand"?

(I suppose if you run the program under different conditions, then yes, you are going to get some extra overhead in different places. So perhaps what you simply need to do is to figure out where the extra overhead is coming from when you start up cc1psx.exe via DOSEmu or DOSBox. Unfortunately it does not seem that emu2 can easily support running this program in the short term...)

Thank you!

mkst commented 1 year ago

Hah I've been trying to keep the details light to avoid derailing this Issue... but here goes.

We have a PR to our project (https://github.com/decompme/decomp.me/pull/651) that adds dosemu2 in order to support these old compilers that won't run under WINE.

When testing the PR locally I found that the execution of dosemu2 is taking ~5 seconds. strace throws out lots of stuff but nothing is hanging it's just doing a bunch of stuff. When I simply call the same command from a simple python script, execution only takes 500ms. So I need to do some further investigation to try to determine why a subprocess.run call in our app takes 5 seconds, but a subprocess.run in a standalone script takes 500ms (equivalent time as running it by hand, from the shell).

As I was getting nowhere with figuring out the slowdown, I went on a hunt for alternatives to dosemu2, of which emu2 looked like a potential candidate - however as it only supports 16bit instructions (and this CC1PSX requires 32bit support but presents itself as a 16bit exe) it turns out it's not going to be a drop-in replacement.

As another tangent, we have our own lightweight alternative to WINE (https://github.com/decompals/wibo) which is only for cli applications (it has some magic to kick off the exe and then intercepts all DLL calls with our own replacements) - do you know if something similar would be possible for dos? i.e. intercepting systemcalls rather than trying to emulate an x86 processor?

dmsc commented 1 year ago

Hi!

As another tangent, we have our own lightweight alternative to WINE (https://github.com/decompals/wibo) which is only for cli applications (it has some magic to kick off the exe and then intercepts all DLL calls with our own replacements) - do you know if something similar would be possible for dos? i.e. intercepting systemcalls rather than trying to emulate an x86 processor?

You will need to emulate at least a 80386, as DOS applications rely on x86 16 bit support and real-mode, and this is very difficult to manage that in current x86 processors. Also, by running natively, you can't run your program in other CPU architectures.

For example, the cc1psx.exe access I/O ports 21 and A1, this will need port emulation. And many DOS programs write directly to the screen, emu2 has a text-screen emulator so it can keep the DOS I/O, BIOS I/O and direct screen access synchronized.

I refrained to add 80386 support to emu2 because it opens a can of worms - many DOS programs when detecting a 386 try to start in protected mode, or even in unreal mode. I have a branch that tries to add 80286 protected mode support, and I could not make it work with many programs, as 286 protected mode is not that documented.

ghaerr commented 1 year ago

Hello @dmsc,

this is the disassembly of the function

Nice analysis!

it only crashes on any processor without 32 bit support if the offending instruction is replaced with a NOP, this is the result

Out of curiosity, on real hardware there's no crash, instead an 80186 or 80286 CPU will generate an invalid opcode exception (INT 6), rather than ignore the 66h addr32 prefix like the 8086 does, correct? Does MSDOS (or the BIOS) just have a null (IRET only) handler for this vector, causing the program to resume execution just past the 66h byte for what will then be a XOR BP,BP instruction, after which the "CPU must be a 386 to run this program" message is displayed?

For example, the cc1psx.exe access I/O ports 21 and A1

Wow, so an MSDOS .exe talks to the PC programmable interrupt controller... Do you find this and other hardware port I/O is somewhat common in programs that attempt to manage XMS or APIs that came later in DOS?

Thank you!

dmsc commented 1 year ago

Hi!

Nice analysis!

it only crashes on any processor without 32 bit support if the offending instruction is replaced with a NOP, this is the result

Out of curiosity, on real hardware there's no crash, instead an 80186 or 80286 CPU will generate an invalid opcode exception (INT 6), rather than ignore the 66h addr32 prefix like the 8086 does, correct?

Yes. emu2 does generate the INT 6 (actually a TRAP in a 80286), and the default handler terminates the application. In a 8086, the opcode 66h is an alias for 76h, ("JAE"), consuming the next byte and generally executing bogus code.

Does MSDOS (or the BIOS) just have a null (IRET only) handler for this vector, causing the program to resume execution just past the 66h byte for what will then be a XOR BP,BP instruction, after which the "CPU must be a 386 to run this program" message is displayed?

No, the 80286 TRAP should set the CS:IP saved in the stack to the faulting instruction, so a IRET will return to the same instruction, causing a fault again, so the PC will lockup.

For example, the cc1psx.exe access I/O ports 21 and A1

Wow, so an MSDOS .exe talks to the PC programmable interrupt controller... Do you find this and other hardware port I/O is somewhat common in programs that attempt to manage XMS or APIs that came later in DOS?

Typical is accessing the PIC and the keyboard controller - this is needed to exit from protected mode in 80286.

For a command line program, you can ignore all those accesses, most are there because they are part of the C standard library at initialization. Most C runtimes start by saving the interrupt handlers and accessing the PIC to ensure that software interrupts are properly handled.

Have Fun!

tkchia commented 1 year ago

Hello @dmsc,

As another tangent, we have our own lightweight alternative to WINE (https://github.com/decompals/wibo) which is only for cli applications (it has some magic to kick off the exe and then intercepts all DLL calls with our own replacements) - do you know if something similar would be possible for dos? i.e. intercepting systemcalls rather than trying to emulate an x86 processor?

You will need to emulate at least a 80386, as DOS applications rely on x86 16 bit support and real-mode, and this is very difficult to manage that in current x86 processors. Also, by running natively, you can't run your program in other CPU architectures.

I think something like WiBo might be able to run cc1psx.exe, but as you pointed out, this approach will only work on x86's, at least initially. (And of course, someone will need to expend the effort to program such a thing.)

I think what a WiBo-like wrapper will need to care about, is what services the actual 32-bit COFF program — the thing that appears after the DOS extender stub in the program binary — really needs. The DOS extender used seems to be go32 v1.x, so I guess the task will be to figure out the go32 ABI that is implemented.

Thank you!

tkchia commented 1 year ago

Hello @dmsc,

I have a branch that tries to add 80286 protected mode support, and I could not make it work with many programs, as 286 protected mode is not that documented.

I am curious — what problems did you encounter in particular?

(And, I do not quite believe that 80286 protected mode is "not that documented". Intel's Software Developer's Manual is still very much around. And the source code listings of IBM's PC AT BIOS, including its protected mode services, are available. But one needs to read these really closely.)

Thank you!