GrammaTech / ddisasm

A fast and accurate disassembler
https://grammatech.github.io/ddisasm/
GNU Affero General Public License v3.0
646 stars 60 forks source link

PE32 reassembly issues #24

Closed Trass3r closed 3 years ago

Trass3r commented 3 years ago

Following up on https://github.com/GrammaTech/ddisasm/issues/11#issuecomment-801455091.

error A2008: syntax error : pushal
error A2008: syntax error : popal
error A2152: coprocessor register cannot be first operand : fdivr ST(0)

Should be pushad/popad and fdivr ST(0),ST(0) (D8 F8).

And there are false positives where it identified RGB data in the resource section as references, I'm not sure if that is even possible:

_RSRC SEGMENT
ALIGN 16
...
          BYTE 077H
          BYTE 077H
          BYTE 077H
          BYTE 000H
          DWORD $L_6b6b6b
          DWORD $L_5f5f5f
          DWORD $L_535353
          BYTE 047H
          BYTE 047H
          BYTE 047H
          BYTE 000H

Also there were issues with name mangling. I had to use ml /link /entry:_EntryPoint instead of /entry:__EntryPoint (it then still resulted in warning LNK4216: Exported entry point __EntryPoint though).

Then it generated import declarations like

EXTERN __imp__CreateFileA:PROC
EXTERN _CreateFileA:PROC

while C:\Program Files (x86)\Windows Kits\10\Lib\10.0.18362.0\um\x86\kernel32.Lib contains stdcall decorated

_CreateFileA@28 __imp__CreateFileA@28
test.obj : error LNK2001: unresolved external symbol __imp__CreateFileA
test.obj : error LNK2001: unresolved external symbol _CreateFileA
itsmattkc commented 3 years ago

fsubr has the same issue. Instruction is disassembled as fsubr st(0) instead of fsubr st(0), st(0).

itsmattkc commented 3 years ago

It's worth noting that a lot of these issues appear to be exclusive to 32-bit binaries. When I try to reassemble the ex1 example, it works perfectly with when the executable is compiled 64-bit and reassembled with ml64. However when it's compiled 32-bit and reassembled with ml, that's when errors like error LNK2001: unresolved external symbol appear.

Upon investigation this would make sense because the exports are genuinely different between 32-bit and 64-bit Windows libraries. Here is some of the output of dumpbin /exports for the 32 and 64 bit versions (respectively) of kernel32.lib: image

I've also had issues on some disassemblies produced by ddisasm (after manually correcting the unresolved external symbol issues) where ml will simply hang indefinitely. This does not occur with a program as simple as ex1, but on some larger disassemblies, ml appears to enter some kind of infinite loop, using CPU cycles but never exiting. I have a sneaking suspicion this is related to floating point instructions too (due to the nature of the binaries that trigger this and other reports of MASM having stability problems with incorrect float instructions), but this is really just a theory.

kwarrick commented 3 years ago

@Trass3r

I had to use ml /link /entry:_EntryPoint instead of /entry:EntryPoint (it then still resulted in warning LNK4216: Exported entry point EntryPoint though).

That second warning should be fixed now, but the entry point with a single underscore is still necessary as @itsmattkc shows 32-bit and 64-bit export symbols are handled differently.

@itsmattkc

Upon investigation this would make sense because the exports are genuinely different between 32-bit and 64-bit Windows libraries.

Yes, this is something I am still unhappy about, but it appears impossible to provide a symbol name in the assembly for 32-bit that link.exe will resolve correctly.

We have sidestepped the issue by adding a --generate-import-libs command-line argument that will generate .DEF file and call lib.exe to create a .LIB file in your working directory for each import. These will satisfy the linker.

$ cd ddisasm/examples/ex1
$ cl /nologo ex.c
$ ddisasm --asm out.asm --generate-import-libs ex.exe
$ ls *.lib
KERNEL32.lib
$ ml out.asm /link /subsystem:console /entry:_EntryPoint

That should fix all the unresolved external symbol errors.

kwarrick commented 3 years ago

PUSHAL, POPAL, FDIVR, and FSUBR should all be printed correctly now.

Trass3r commented 3 years ago

What about the resource section issue? And which commit fixed the instructions?

kwarrick commented 3 years ago

Can you provide a sample binary for the resource section issue?

The instructions are fixed in gtirb-pprinter:

https://github.com/GrammaTech/gtirb-pprinter/commit/70ecde07a6c05836f9278ad863fccf40e5e1f8d7 https://github.com/GrammaTech/gtirb-pprinter/commit/701ed49d5844e128ee25ec38b28e87f74023a5bf

kwarrick commented 3 years ago

The warning LNK4216: Exported entry point __EntryPoint problem was fixed here: https://github.com/GrammaTech/gtirb-pprinter/commit/c6af84bc78fa955e28a2d0a8756010b8d80bd1f2

Trass3r commented 3 years ago

We have sidestepped the issue by adding a --generate-import-libs command-line argument that will generate .DEF file and call lib.exe to create a .LIB file in your working directory for each import. These will satisfy the linker.

I see, nice idea. But it does not work on Linux / in the docker container of course: ERROR: Unable to find lib.exe.

kwarrick commented 3 years ago

Yes, that is annoying. With your suggestion of UASM though, we should be able to use uasm and LLVM's lld-link on Linux. That will let us disassemble, reassemble, and link Windows PE without any MSVC dependencies.

I'll give an update when that lands in gtirb-pprinter.

Trass3r commented 3 years ago

Can you provide a sample binary for the resource section issue?

Maybe with an .rc file like:

testrsrc RCDATA
{
    0x77,
    0x77,
    0x77,
    0x00,
    0x6b,
    0x6b,
    0x6b,
    0x00,
    0x5f,
    0x5f,
    0x5f,
    0x00,
    0x53,
    0x53,
    0x53,
    0x00,
    0x47,
    0x47,
    0x47,
    0x00
}

Just with data that looks like an actual .text section address, like 0x401000 or something.

kwarrick commented 3 years ago

@Trass3r

Oh, this is clearer to me now. I overlooked the fact the original false positive you showed was in the _RSRC section.

Please correct me if I am wrong, but we should never be symbolizing data in resource data. Right?

Trass3r commented 3 years ago

I think so cause it's pure external data, wouldn't know how to put a reference in there.