Open Trass3r opened 2 years ago
The GTIRB representation has more specific details on the function boundaries. E.g. the aux data table "functionBlocks" has a list of basic blocks contained in each function. This information is currently ignored by the MASM pretty printer. Using PROC
definitions was causing problems for us in certain cases, so we avoided using them, but it's probably time to revisit that decision.
Regarding the non-trivial differences, if you can give more details or example on instructions using different registers and different instructions, we could look into it further. I don't see a reason why that should be the case.
This information is currently ignored by the MASM pretty printer. Using PROC definitions was causing problems for us in certain cases, so we avoided using them, but it's probably time to revisit that decision.
Ah I already suspected something like this. +1
I don't see a reason why that should be the case.
Ok thanks good to know.
Not sure what the best way to compare would be so I reassembled the file and compared the objdump disassembly with TortoiseMerge. It allows you to set up regex filters to ignore hex numbers ([0-9a-fA-F]+
) so you mostly get the actual differences.
I just noticed that the first big difference is actually a jump table, so the disasm is just misleading and it's ok:
jmp DWORD PTR [eax*4+0x1000e263]
https://gist.github.com/Trass3r/db3a655ef9e5a5e32bd704786f0d52f6#file-org-asm-L15473
There are also subtle cases where an int3 is missing after a call, seems like it's a noreturn function call:
1000ccf1: e8 0b 1a 00 00 call 0x1000e701
1000ccf6: cc int3
1000ccf7: 8b ff mov edi,edi
1000ccf9: 55 push ebp
Is there a better approach? I guess the first step would be to fix the alignment to match. This binary actually works re-assembled but in another test case there were noticeable behavior changes.
Ok seems like alignment is fine, msvc link inserts a bunch of CC bytes at the start of .text for whatever reason. Using lld-link fixes that but the offsets will still get out of sync by different encodings being used for some instructions. Regarding the jump tables, maybe they can be annotated somehow to mark them as data, fwiw.
leave
ret
BYTE 08bH
BYTE 0ffH
$L_1000e263 DWORD $L_1000da4d
DWORD $L_1000d84c
DWORD $L_1000d87e
DWORD $L_1000d8da
DWORD $L_1000d926
DWORD $L_1000d932
DWORD $L_1000d978
DWORD $L_1000daa8
Yet in the resulting (edit: MASM) disassembly most functions are simple labels, indistinguishable from function-internal block labels. Even an exported function is just an empty
PROC EXPORT
ENDP
block in front of the function. Is there a way to improve this or is it simply not implemented?Also, are non-trivial differences in the assembly to be expected? When comparing the original executable and the re-assembled one I saw NFCs like alignment or lea [x] vs lea [x+0], but also completely different registers and instructions sometimes.