diasurgical / devilution-comparer

Small helper tool to aid comparing functions between devilution and the original binary
The Unlicense
21 stars 9 forks source link

Show relative jumps as relative jumps instead of their calculated address #1

Closed seritools closed 6 years ago

seritools commented 6 years ago

This should make the "diff noise" considerably smaller, for example in cases where there is an additional instruction shifting everything after that a few bytes. This causes any line with a relative jump to show a diff, since the calculated absolute address is different, even though they all are correct/binary exact.

capstone-rs currently doesn't expose this info, so we'll probably either have to fork it or use the underlying capstone-sys, which are raw bindings to the lib.

nomdenom commented 6 years ago

As I mentioned earlier, perhaps we could use a Docker-ized version of IDA freeware (there's a Linux version) to do the disassembly? IDA has "enumerated names (eg. loc_1)" available under "Options > Name Representation...", though not sure how to set that from the command line.... With that, the locations or deltas do not even matter, just the order of the basic blocks.

seritools commented 6 years ago

Looked into graph possibilities, and radiff2 from the radare toolset could be promising for that:

radiff2 -O -g 0x0041F320,0x004214E3 <diablo_orig.exe> <devilution.exe> | dot -Tpdf -o foo.pdf

gave me the following: foo.pdf

Especially nice is the -O switch: code diffing with opcode bytes only This means it won't complain about wrong function offsets (I think?). The output isn't quite as helpful as I thought, a textual output could still prove easier to use.

seritools commented 6 years ago

Alright, switched over to another disassembler that gives you access to all of the details, and you can hook into the actual output generation.

You can now select the level of information you want in the output! Should definitely help with the amound of diff noise.

Address output at the start of every line is disabled by default now as well, use -i or --show-ip to enable it again. Since relative jumps now actually show up as relative, both orig.asm and compare.asm now show their own addresses as well.

Default output now, without any switches:

mov [esp+0x0F], bl
mov [esp+0x10], edi
lea ecx, [esi+0x682808]
call <imm_fn>
call [0x47F184]
mov ecx, eax
call <imm_fn>
cmp edi, 0x04
jb $+0xD
push edi
push 0x4A694C
call <imm_fn>
pop ecx
pop ecx
movsx ecx, bl
shl ecx, 0x02
mov [esi+0x682968], bl
xor ebx, ebx
mov al, [ecx+0x4A66AC]
cmp al, bl
jnl $+0x2
xor al, al
movsx eax, al

With --no-mem-disp:

mov [esp+<disp>], bl
mov [esp+<disp>], edi
lea ecx, [esi+<disp>]
call <imm_fn>
call [<indir_fn>]
mov ecx, eax
call <imm_fn>
cmp edi, 0x04
jb $+0xD
push edi
push 0x4A694C
call <imm_fn>
pop ecx
pop ecx
movsx ecx, bl
shl ecx, 0x02
mov [esi+<disp>], bl
xor ebx, ebx
mov al, [ecx+<disp>]
cmp al, bl
jnl $+0x2
xor al, al
movsx eax, al

With --no-mem-disp --no-imms:

mov [esp+<disp>], bl
mov [esp+<disp>], edi
lea ecx, [esi+<disp>]
call <imm_fn>
call [<indir_fn>]
mov ecx, eax
call <imm_fn>
cmp edi, <imm>
jb $+0xD
push edi
push <imm>
call <imm_fn>
pop ecx
pop ecx
movsx ecx, bl
shl ecx, <imm>
mov [esi+<disp>], bl
xor ebx, ebx
mov al, [ecx+<disp>]
cmp al, bl
jnl $+0x2
xor al, al
movsx eax, al

The more switches you enable, the easier is the diffing (but the more likely it is to miss differences that are masked). The default settings should be used predominantly for the main cleanup work.

mewmew commented 6 years ago

@seritools That's really great! Zydis looks like a clean library as well.

The more switches you enable, the easier is the diffing (but the more likely it is to miss differences that are masked). The default settings should be used predominantly for the main cleanup work.

The end goal will be to make the diff identical without having to use any switches. And as you mentioned, for now the main work will be to get Devilution almost correct, as that will help resolve a lot of bugs in the process.