Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
928 stars 209 forks source link

Wrong relocation on c++ ELF binaries when offsets are present #4858

Open patacca opened 10 months ago

patacca commented 10 months ago

Version and Platform (required):

Bug Description: Address pointing to the incorrect external symbol due to the wrong interpretation of the relocation data. Steps To Reproduce: Consider the following c++ code (the executable is attached here test.tar.gz EDIT file updated)

  struct Interface1 {
      virtual void foo() {};
  };
  struct Interface2 : virtual Interface1 {
      virtual void bar() {};
  };

  int main() {
    Interface2 *a = new Interface2();
    return 0;
  }

# build with g++ -o test test.cpp

Binary Ninja correctly identifies the RTTI structures for the classes Interface1 and Interface2. Now consider the variable _typeinfo_for_Interface1 (address 0x3da0 in the binary attached): BN doesn't assign a meaningful data type to that variable but in fact it is pointer to a std::type_info derived class (for example abi::_class_type_info or abi::__vmi_class_type_info) that is an external symbol imported from the standard c++ library.

There is an entry in the .rela.dyn section for that address, relocating it to the right symbol in libc++.so: _ZTVN10__cxxabiv117__class_type_infoE@CXXABI_1.3 + 10 (notice there is an offset of 0x10 to the symbol, that is necessary by following c++ Itanium ABI).

By changing the type of _typeinfo_for_Interface1 to void * we can see that the address being pointed is not the correct one (in this case _ZTVN10__cxxabiv117__class_type_infoE@CXXABI_1.3 + 0x10) but instead we see a different one (in the binary attached it is _vtable_for___cxxabiv1::__vmi_class_type_info).

My guess is that the offset of 0x10 is applied at the wrong moment (for example in the exported symbols table of the library or as a raw offset in the binary itself?) resulting in a wrong function being imported.

Moreover there is no API (python or C++) that lets you access the BinaryNinja::Relocation objects. Supposedly, there should be BinaryNinja::Segment::GetRelocationsInRange that should return a vector of relocations but even though it is listed in binaryninjaapi.h, the symbol is not exported in the binary itself.

Exposing such method could at least mitigate this problem as I could always "relocate" manually the addresses.

xusheng6 commented 10 months ago

@patacca are you sure you attached the correct file? I see classes like DerivedA, DerivedB in it, but no Interface1, etc. Also address 0x3da0 is the start of __elf_dynamic_table

patacca commented 10 months ago

Sorry about that. I indeed attached the wrong file! I was experimenting with virtual classes and I messed up the files. In any case if you try to compile the source file with g++ you should be able to reproduce the issue.

Here is the correct binary test.tar.gz

My apologies.

ExecuteProtect commented 10 months ago

Nice job narrowing down the source of the issue, I overlooked that entirely.

D0ntPanic commented 10 months ago

BinaryView::GetRelocationsAt API has been added in version 3.6.4745

xusheng6 commented 10 months ago

Sorry about that. I indeed attached the wrong file! I was experimenting with virtual classes and I messed up the files. In any case if you try to compile the source file with g++ you should be able to reproduce the issue.

Here is the correct binary test.tar.gz

My apologies.

Thanks for the updated file! Now what you said all makes sense

patacca commented 10 months ago

@D0ntPanic thanks for adding the BinaryView::GetRelocationsAt API but as it is right now it suffers from an implementation issue that should be fixed in #4866

xusheng6 commented 10 months ago

What is the status of this issue? I see that the BinaryView::GetRelocationsAt API is added, and a small bug in it gets fixed. But how about the original issue? Did someone look into it?

lwerdna commented 9 months ago

Hopefully the API is working now. You should be able to see 0x3DA0 in [hex(start) for (start, end) in bv.relocation_ranges] or get a true response from bv.relocation_ranges_at(0x3da0).

Ok, so at address 0x3DA0 I think we name the location to _typeinfo_for_Interface1 because of this .symtab (local) symbol:

                                                                               Elf64_Sym[30] "_ZTI10Interface1"
  00003300                         AD 01 00 00                      ....         st_name=0x1AD "_ZTI10Interface1"
  00003300                                     21                       !        st_info bind:2(WEAK) type:1(OBJECT)
  00003300                                        00                     .       st_other=0x0
  00003300                                           15 00                ..     st_shndx=0x15
  00003310 A0 3D 00 00 00 00 00 00                          .=......             st_value=0x3DA0
  00003310                         10 00 00 00 00 00 00 00          ........     st_size=0x10

Indeed we can do binaryninja.demangle.demangle_gnu3(bv.arch, '_ZTI10Interface1') and verify_typeinfo_for_Interface1.

The value at this location is overwritten by a relocation:

                                                                               Elf64_Rela
  00000790                         A0 3D 00 00 00 00 00 00          .=......     r_offset=0x3DA0
  000007A0 01 00 00 00 02 00 00 00                          ........             r_info=0x200000001 sym=0x2 type=0x1 (R_X86_64_64)
  000007A0                         10 00 00 00 00 00 00 00          ........     r_addend=16

It should get the address of whatever the .dynstr (dynamic) symbol 2 is:

                                                                               Elf64_Sym[2] "_ZTVN10__cxxabiv117__class_type_infoE"
  00000410 70 00 00 00                                      p...                 st_name=0x70 "_ZTVN10__cxxabiv117__class_type_infoE"
  00000410             11                                       .                st_info bind:1(GLOBAL) type:1(OBJECT)
  00000410                00                                     .               st_other=0x0
  00000410                   00 00                                ..             st_shndx=0x0
  00000410                         00 00 00 00 00 00 00 00          ........     st_value=0x0
  00000420 00 00 00 00 00 00 00 00                          ........             st_size=0x0

If we demangle that using binaryninja.demangle.demangle_gnu3(bv.arch, '_ZTVN10__cxxabiv117__class_type_infoE') we get ['_vtable_for___cxxabiv1', '__class_type_info'] which seems to be represented by the contents at 4040:

00004040  extern struct vtable_for___cxxabiv1::__vmi_class_type_info _vtable_for___cxxabiv1::__vmi_class_type_info

In other words, a quick look appears to show it doing the right thing, except the +0x10 part. Another way of showing this is with bv.get_data_var_at(0x4040).name which yields _ZTVN10__cxxabiv121__vmi_class_type_infoE.

I'm not very knowledgeable on C++ internals. Would you mind sketching what you would expect the .extern section to look like? ~Would you want 0x4040 to stay the same, but have it occupy at least 16 bytes so 0x3DA0 could contain 0x4050?~

EDIT: please see crossed out text above, and next comment below

lwerdna commented 9 months ago

Would you want the 0x4040 to be displayed as 0x4030 + 0x10? With names, that would replace:

00003da0  struct vtable_for___cxxabiv1::__vmi_class_type_info* __vmi_class_type_info = _vtable_for___cxxabiv1::__vmi_class_type_info

with:

00003da0  struct vtable_for___cxxabiv1::__class_type_info* __class_type_info = _vtable_for___cxxabiv1::__class_type_info + 0x10

And in picture form, red is what we currently show, but green is what would be preferred, right?

image

patacca commented 9 months ago

Sorry for the delay but I have been busy recently. There are few points that I would like to address:

As you correctly said this is relocating the content of address 0x3da0 with the address of whatever the .dynstr (dynamic) symbol 2 is (in this case _ZTVN10__cxxabiv117__class_type_infoE, aka _vtable_for___cxxabiv1::__class_type_info), PLUS the offset r_addend. So in the end the relocated value should be _vtable_for___cxxabiv1::__class_type_info + 0x10, of course it will be an address inside the imported library virtual memory.

If you look with gdb it's more clear:

[11:07] >> gdb ./test 
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

warning: /usr/share/pwndbg/gdbinit.py: No such file or directory
Reading symbols from ./test...
Downloading separate debug info for /tmp/test
(No debugging symbols found in ./test)                                                                                                                                                                             
(gdb) b main
Breakpoint 1 at 0x113d
(gdb) r
Starting program: /tmp/test 
Downloading separate debug info for system-supplied DSO at 0x7ffff7fc8000                                                                                                                                          
[Thread debugging using libthread_db enabled]                                                                                                                                                                      
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Breakpoint 1, 0x000055555555513d in main ()
(gdb) info proc mappings 
process 2422286
Mapped address spaces:

          Start Addr           End Addr       Size     Offset  Perms  objfile
      0x555555554000     0x555555555000     0x1000        0x0  r--p   /tmp/test
      0x555555555000     0x555555556000     0x1000     0x1000  r-xp   /tmp/test
      0x555555556000     0x555555557000     0x1000     0x2000  r--p   /tmp/test
      0x555555557000     0x555555558000     0x1000     0x2000  r--p   /tmp/test
      0x555555558000     0x555555559000     0x1000     0x3000  rw-p   /tmp/test
      0x555555559000     0x55555557a000    0x21000        0x0  rw-p   [heap]
      0x7ffff7a1e000     0x7ffff7a44000    0x26000        0x0  r--p   /usr/lib/libc.so.6
      0x7ffff7a44000     0x7ffff7b9e000   0x15a000    0x26000  r-xp   /usr/lib/libc.so.6
      0x7ffff7b9e000     0x7ffff7bf2000    0x54000   0x180000  r--p   /usr/lib/libc.so.6
      0x7ffff7bf2000     0x7ffff7bf6000     0x4000   0x1d3000  r--p   /usr/lib/libc.so.6
      0x7ffff7bf6000     0x7ffff7bf8000     0x2000   0x1d7000  rw-p   /usr/lib/libc.so.6
      0x7ffff7bf8000     0x7ffff7c00000     0x8000        0x0  rw-p   
      0x7ffff7c00000     0x7ffff7c9c000    0x9c000        0x0  r--p   /usr/lib/libstdc++.so.6.0.32
      0x7ffff7c9c000     0x7ffff7dd3000   0x137000    0x9c000  r-xp   /usr/lib/libstdc++.so.6.0.32
      0x7ffff7dd3000     0x7ffff7e69000    0x96000   0x1d3000  r--p   /usr/lib/libstdc++.so.6.0.32
      0x7ffff7e69000     0x7ffff7e76000     0xd000   0x269000  r--p   /usr/lib/libstdc++.so.6.0.32
      0x7ffff7e76000     0x7ffff7e77000     0x1000   0x276000  rw-p   /usr/lib/libstdc++.so.6.0.32
      0x7ffff7e77000     0x7ffff7e7b000     0x4000        0x0  rw-p   
      0x7ffff7e81000     0x7ffff7e85000     0x4000        0x0  rw-p   
      0x7ffff7e85000     0x7ffff7e89000     0x4000        0x0  r--p   /usr/lib/libgcc_s.so.1
      0x7ffff7e89000     0x7ffff7ea4000    0x1b000     0x4000  r-xp   /usr/lib/libgcc_s.so.1
      0x7ffff7ea4000     0x7ffff7ea8000     0x4000    0x1f000  r--p   /usr/lib/libgcc_s.so.1
      0x7ffff7ea8000     0x7ffff7ea9000     0x1000    0x22000  r--p   /usr/lib/libgcc_s.so.1
      0x7ffff7ea9000     0x7ffff7eaa000     0x1000    0x23000  rw-p   /usr/lib/libgcc_s.so.1
      0x7ffff7eaa000     0x7ffff7eba000    0x10000        0x0  r--p   /usr/lib/libm.so.6
      0x7ffff7eba000     0x7ffff7f39000    0x7f000    0x10000  r-xp   /usr/lib/libm.so.6
      0x7ffff7f39000     0x7ffff7f95000    0x5c000    0x8f000  r--p   /usr/lib/libm.so.6
      0x7ffff7f95000     0x7ffff7f96000     0x1000    0xea000  r--p   /usr/lib/libm.so.6
      0x7ffff7f96000     0x7ffff7f97000     0x1000    0xeb000  rw-p   /usr/lib/libm.so.6
      0x7ffff7f97000     0x7ffff7f99000     0x2000        0x0  rw-p   
      0x7ffff7fc4000     0x7ffff7fc8000     0x4000        0x0  r--p   [vvar]
      0x7ffff7fc8000     0x7ffff7fca000     0x2000        0x0  r-xp   [vdso]
      0x7ffff7fca000     0x7ffff7fcb000     0x1000        0x0  r--p   /usr/lib/ld-linux-x86-64.so.2
      0x7ffff7fcb000     0x7ffff7ff1000    0x26000     0x1000  r-xp   /usr/lib/ld-linux-x86-64.so.2
      0x7ffff7ff1000     0x7ffff7ffb000     0xa000    0x27000  r--p   /usr/lib/ld-linux-x86-64.so.2
      0x7ffff7ffb000     0x7ffff7ffd000     0x2000    0x31000  r--p   /usr/lib/ld-linux-x86-64.so.2
      0x7ffff7ffd000     0x7ffff7fff000     0x2000    0x33000  rw-p   /usr/lib/ld-linux-x86-64.so.2
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]
(gdb) x/gx 0x555555554000+0x3da0
0x555555557da0 <_ZTI10Interface1>:  0x00007ffff7e6ac90
(gdb) x/gx *(uint64_t *)(0x555555554000+0x3da0)
0x7ffff7e6ac90 <_ZTVN10__cxxabiv117__class_type_infoE+16>:  0x00007ffff7cae8a0

As you can see the address of 0x3da0, after the relocation, points to 0x7ffff7e6ac90 (aka _ZTVN10__cxxabiv117__class_type_infoE+16) that lives inside the read-only section of /usr/lib/libstdc++.so.6.0.32.

As you have pointed out with the red/green arrows, right now BN will consider the offset r_addend as being applied immediately, hence the relocated address becomes 0x4030 + 0x10 = 0x4040 that is _vtable_for___cxxabiv1::__vmi_class_type_info that is the symbol that just so happen to be present at that address, but it has nothing to do with the current relocation. Any symbol might have been present at 0x4040, even from a different imported library.

It's not trivial to figure out what is the best way of handling offsets in relocations (see the same problem in ghidra https://github.com/NationalSecurityAgency/ghidra/issues/2261) because there's no reliable way of knowing what will be present inside a library at address symbol+offset. Imho it is better to just leave it out for the reverser, so display it with the explicit offset, like _vtable_for___cxxabiv1::__class_type_info + 0x10. It's important to understand though that _vtable_for___cxxabiv1::__class_type_info + 0x10 is different from _vtable_for___cxxabiv1::__vmi_class_type_info.