cesena / ghidra2dwarf

🐉 Export ghidra decompiled code to dwarf sections inside ELF binary
MIT License
179 stars 17 forks source link

Fix line number table corruption, fix #12 #21

Closed nneonneo closed 1 year ago

nneonneo commented 1 year ago

This is a rollup PR for a few miscellaneous fixes that I've made to ghidra2dwarf, including a fix for #12. With the fix, DWARF export for ARM works fine.

fam4r commented 1 year ago

ping @meowmeowxw

meowmeowxw commented 1 year ago

Hi @nneonneo , Thank you for the PR. Can you try to push an empty commit or to rebase to the latest commit of the master branch? I should have fixed the failing workflow, but I can't find a way to run it against your branch without merging your master branch into a separate branch in this repository :thinking: . Do you have a binary that I can test the PR with? I also have to validate the new offsets (+ - 1)

nneonneo commented 1 year ago

Pushed. Here's a test binary with source: testbin.zip

This is compiled with x86_64-linux-android-gcc -m32 --sysroot=${NDK}/platforms/android-21/arch-x86 test.c -g -o test_gcc_dbg. test is the stripped version of test_gcc_dbg.

testbin_g2d.zip contains the binaries generated by the current version (test_dbg_old) and the PR version (test_dbg_new).

As a test, try the following:

$ gdb ./test_dbg_old
GNU gdb (GDB) 12.1
...
(gdb) disas hswap
Dump of assembler code for function hswap:
   0x080483b8 <+0>: push   ebp
   0x080483b9 <+1>: mov    ebp,esp
   0x080483bb <+3>: mov    eax,DWORD PTR [ebp+0x8]
   0x080483be <+6>: shl    eax,0x10
   0x080483c1 <+9>: mov    edx,eax
   0x080483c3 <+11>:    mov    eax,DWORD PTR [ebp+0x8]
   0x080483c6 <+14>:    shr    eax,0x10
   0x080483c9 <+17>:    or     eax,edx
End of assembler dump.

And with the PR:

$ gdb ./test_dbg_new
GNU gdb (GDB) 12.1
...
(gdb) disas hswap
Dump of assembler code for function hswap:
   0x080483b8 <+0>: push   ebp
   0x080483b9 <+1>: mov    ebp,esp
   0x080483bb <+3>: mov    eax,DWORD PTR [ebp+0x8]
   0x080483be <+6>: shl    eax,0x10
   0x080483c1 <+9>: mov    edx,eax
   0x080483c3 <+11>:    mov    eax,DWORD PTR [ebp+0x8]
   0x080483c6 <+14>:    shr    eax,0x10
   0x080483c9 <+17>:    or     eax,edx
   0x080483cb <+19>:    pop    ebp
   0x080483cc <+20>:    ret
End of assembler dump.

As you can see, the fixed version properly includes the epilogue (pop ebp, ret). You can also verify that the new version's DW_AT_high_pc entries match what GCC generates in test_gcc_dbg for hswap and main.


As for ARM support, try the following binary (same .c file as before), compiled with arm-linux-androideabi-gcc --sysroot=${NDK}/platforms/android-21/arch-arm test.c -o test_arm_gcc_dbg:

test_arm.zip

It will crash with the error Exception: enum SHT has no entry with code 1879048195. on the current master, and will succeed with the PR.


P.S. The latest batch of changes includes a small fix to add DW_LNE_end_sequence to the very end of the .debug_line section. This fixes the warning warning: last sequence in debug line table at offset 0x00000000 is not terminated generated by dwarfdump --debug-line. You can verify that this warning shows up with dwarfdump --debug-line test_dbg_old but not with dwarfdump --debug-line test_dbg_new. You'll also note that dwarfdump --verify test_dbg_old throws many errors and warnings; dwarfdump --verify test_dbg_new does not.

NextLight commented 1 year ago

Thank you so much for the PR (and for still using this)!

@meowmeowxw the docs confirm that the +1 offset is correct, it says DW_AT_high_pc should be the "address of the first location past the last instruction associated with the entity". Actually this was already fixed in my (wip) rewrite and I never noticed, so I guess I also tested it.

Great catch with figuring out that the addresses need to be sorted! Is it documented somewhere?
I stopped working on this out of frustration because it never seemed to work properly with big binaries and I couldn't figure out why. I always assumed it was a bug in libdwarf and they made building the new version a nightmare so I gave up. Your commit probably fixes everything so thank you for the dedication and for making it public!

I'll hopefully have some time to test it in the next few days, but the code looks very reasonable and I'm also ok with merging this blindly @meowmeowxw .

nneonneo commented 1 year ago

I found this out essentially by realizing that libdwarf was generating garbage when the addresses decreased.

However, I found this in the DWARF v2 standard, in section 6.2.5: “Within a sequence, addresses may only increase.”. Similar text is found in the DWARF v4 standard.

nneonneo commented 1 year ago

I've updated the comments on the commits to properly reference the DWARF standard where appropriate.

meowmeowxw commented 1 year ago

Hello @nneonneo , Thank you for the references in the commit messages, I remember that we spent lots of hours trying to understand why there was a corruption on big binaries of the dwarf table. I tested the new changes and they work fine :smile: , I merge the PR and release the new version