HexHive / retrowrite

RetroWrite -- Retrofitting compiler passes through binary rewriting
Other
655 stars 78 forks source link

Symbolizing memory access fails to identify symbol #30

Open diagprov opened 2 years ago

diagprov commented 2 years ago

As part of an ongoing evaluation of Retrowrite by a third party, we identified a case that fails to symbolize correctly. The following steps reproduce it:

wget https://www.busybox.net/downloads/busybox-1.35.0.tar.bz2
tar xf busybox-1.35.0.tar.bz2
cd busybox-1.35.0
make defconfig
make menuconfig # in here, change to a PIE binary
make

This results in the following exception:

Traceback (most recent call last):
  File "/retrowrite/retro/bin/retrowrite_x64", line 168, in <module>
    rw.symbolize()
  File "/retrowrite/librw_x64/rw.py", line 76, in symbolize
    symb.symbolize_text_section(self.container, None)
  File "/retrowrite/librw_x64/rw.py", line 523, in symbolize_text_section
    self.symbolize_mem_accesses(container, context)
  File "/hexhive/retrowrite/librw_x64/rw.py", line 730, in symbolize_mem_accesses
    target, adjust = self._adjust_target(
  File "/hexhive/retrowrite/librw_x64/rw.py", line 645, in _adjust_target
    assert sec is not None
AssertionError

Adding the following diagnostic code:

diff --git a/librw_x64/rw.py b/librw_x64/rw.py
index 7c36b2f..9e3b9a1 100644
--- a/librw_x64/rw.py
+++ b/librw_x64/rw.py
@@ -680,6 +680,8 @@ class Symbolizer():
                     ripbase = inst.address + inst.sz
                     target = ripbase + value

+                    print("RIP REL Information Value=0x%x,RIPBASE=0x%x,TARGET=0x%x" % (value, ripbase, target))
+
                     is_an_import = False

                     for relocation in container.relocations[".dyn"]:
@@ -715,10 +717,16 @@ class Symbolizer():
                         # Check if target is contained within a known region
                         in_region = self._is_target_in_region(
                             container, target)
+
                         if in_region:
                             inst.op_str = inst.op_str.replace(
                                 hex(value), ".LC%x" % (target))
                         else:
+                            for sec, sval in container.sections.items():
+                                print("%s 0x%x - 0x%x" % (sec, sval.base, sval.sz))
+                            for fn, fval in container.functions.items():
+                                print("%s 0x%x - 0x%x" % (fval.name, fval.start, fval.sz))
+                            print("[*] Adjusting memory access, context: %s %s 0x%x" % (inst, context, target))
                             target, adjust = self._adjust_target(
                                 container, target)
                             inst.op_str = inst.op_str.replace(

To code being refactored in a separate repo suggests that we are unable to correctly identify a rip-relative lea to a text section function. According to the diagnostics, neither the text section nor the function itself are correctly identified.

The root cause of this bug needs to be tracked down and fixed, but is unrelated to previous init_array issues.

The following issues are likely related: #29, #3.