Open patacca opened 10 months ago
@patacca are you sure you attached the correct file? I see classes like DerivedA
, DerivedB
in it, but no Interface1
, etc. Also address 0x3da0 is the start of __elf_dynamic_table
Sorry about that. I indeed attached the wrong file! I was experimenting with virtual classes and I messed up the files. In any case if you try to compile the source file with g++ you should be able to reproduce the issue.
Here is the correct binary test.tar.gz
My apologies.
Nice job narrowing down the source of the issue, I overlooked that entirely.
BinaryView::GetRelocationsAt
API has been added in version 3.6.4745
Sorry about that. I indeed attached the wrong file! I was experimenting with virtual classes and I messed up the files. In any case if you try to compile the source file with g++ you should be able to reproduce the issue.
Here is the correct binary test.tar.gz
My apologies.
Thanks for the updated file! Now what you said all makes sense
@D0ntPanic thanks for adding the BinaryView::GetRelocationsAt
API but as it is right now it suffers from an implementation issue that should be fixed in #4866
What is the status of this issue? I see that the BinaryView::GetRelocationsAt
API is added, and a small bug in it gets fixed. But how about the original issue? Did someone look into it?
Hopefully the API is working now. You should be able to see 0x3DA0 in [hex(start) for (start, end) in bv.relocation_ranges]
or get a true response from bv.relocation_ranges_at(0x3da0)
.
Ok, so at address 0x3DA0 I think we name the location to _typeinfo_for_Interface1
because of this .symtab (local) symbol:
Elf64_Sym[30] "_ZTI10Interface1"
00003300 AD 01 00 00 .... st_name=0x1AD "_ZTI10Interface1"
00003300 21 ! st_info bind:2(WEAK) type:1(OBJECT)
00003300 00 . st_other=0x0
00003300 15 00 .. st_shndx=0x15
00003310 A0 3D 00 00 00 00 00 00 .=...... st_value=0x3DA0
00003310 10 00 00 00 00 00 00 00 ........ st_size=0x10
Indeed we can do binaryninja.demangle.demangle_gnu3(bv.arch, '_ZTI10Interface1')
and verify_typeinfo_for_Interface1
.
The value at this location is overwritten by a relocation:
Elf64_Rela
00000790 A0 3D 00 00 00 00 00 00 .=...... r_offset=0x3DA0
000007A0 01 00 00 00 02 00 00 00 ........ r_info=0x200000001 sym=0x2 type=0x1 (R_X86_64_64)
000007A0 10 00 00 00 00 00 00 00 ........ r_addend=16
It should get the address of whatever the .dynstr (dynamic) symbol 2 is:
Elf64_Sym[2] "_ZTVN10__cxxabiv117__class_type_infoE"
00000410 70 00 00 00 p... st_name=0x70 "_ZTVN10__cxxabiv117__class_type_infoE"
00000410 11 . st_info bind:1(GLOBAL) type:1(OBJECT)
00000410 00 . st_other=0x0
00000410 00 00 .. st_shndx=0x0
00000410 00 00 00 00 00 00 00 00 ........ st_value=0x0
00000420 00 00 00 00 00 00 00 00 ........ st_size=0x0
If we demangle that using binaryninja.demangle.demangle_gnu3(bv.arch, '_ZTVN10__cxxabiv117__class_type_infoE')
we get ['_vtable_for___cxxabiv1', '__class_type_info']
which seems to be represented by the contents at 4040:
00004040 extern struct vtable_for___cxxabiv1::__vmi_class_type_info _vtable_for___cxxabiv1::__vmi_class_type_info
In other words, a quick look appears to show it doing the right thing, except the +0x10 part. Another way of showing this is with bv.get_data_var_at(0x4040).name
which yields _ZTVN10__cxxabiv121__vmi_class_type_infoE
.
I'm not very knowledgeable on C++ internals. Would you mind sketching what you would expect the .extern section to look like? ~Would you want 0x4040 to stay the same, but have it occupy at least 16 bytes so 0x3DA0 could contain 0x4050?~
EDIT: please see crossed out text above, and next comment below
Would you want the 0x4040 to be displayed as 0x4030 + 0x10? With names, that would replace:
00003da0 struct vtable_for___cxxabiv1::__vmi_class_type_info* __vmi_class_type_info = _vtable_for___cxxabiv1::__vmi_class_type_info
with:
00003da0 struct vtable_for___cxxabiv1::__class_type_info* __class_type_info = _vtable_for___cxxabiv1::__class_type_info + 0x10
And in picture form, red is what we currently show, but green is what would be preferred, right?
Sorry for the delay but I have been busy recently. There are few points that I would like to address:
Both bv.relocation_ranges
and bv.relocation_ranges_at
don't give a Relocation
object, from which it is possible to know the symbol being referenced, the offset and so on. The C++ API BinaryView::GetRelocationsAt
instead does the job perfectly so it's a good enough workaround.
Elf64_Rela
00000790 A0 3D 00 00 00 00 00 00 .=...... r_offset=0x3DA0
000007A0 01 00 00 00 02 00 00 00 ........ r_info=0x200000001 sym=0x2 type=0x1 (R_X86_64_64)
000007A0 10 00 00 00 00 00 00 00 ........ r_addend=16
As you correctly said this is relocating the content of address 0x3da0 with the address of whatever the .dynstr (dynamic) symbol 2 is (in this case _ZTVN10__cxxabiv117__class_type_infoE
, aka _vtable_for___cxxabiv1::__class_type_info
), PLUS the offset r_addend
.
So in the end the relocated value should be _vtable_for___cxxabiv1::__class_type_info + 0x10
, of course it will be an address inside the imported library virtual memory.
If you look with gdb it's more clear:
[11:07] >> gdb ./test
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
warning: /usr/share/pwndbg/gdbinit.py: No such file or directory
Reading symbols from ./test...
Downloading separate debug info for /tmp/test
(No debugging symbols found in ./test)
(gdb) b main
Breakpoint 1 at 0x113d
(gdb) r
Starting program: /tmp/test
Downloading separate debug info for system-supplied DSO at 0x7ffff7fc8000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Breakpoint 1, 0x000055555555513d in main ()
(gdb) info proc mappings
process 2422286
Mapped address spaces:
Start Addr End Addr Size Offset Perms objfile
0x555555554000 0x555555555000 0x1000 0x0 r--p /tmp/test
0x555555555000 0x555555556000 0x1000 0x1000 r-xp /tmp/test
0x555555556000 0x555555557000 0x1000 0x2000 r--p /tmp/test
0x555555557000 0x555555558000 0x1000 0x2000 r--p /tmp/test
0x555555558000 0x555555559000 0x1000 0x3000 rw-p /tmp/test
0x555555559000 0x55555557a000 0x21000 0x0 rw-p [heap]
0x7ffff7a1e000 0x7ffff7a44000 0x26000 0x0 r--p /usr/lib/libc.so.6
0x7ffff7a44000 0x7ffff7b9e000 0x15a000 0x26000 r-xp /usr/lib/libc.so.6
0x7ffff7b9e000 0x7ffff7bf2000 0x54000 0x180000 r--p /usr/lib/libc.so.6
0x7ffff7bf2000 0x7ffff7bf6000 0x4000 0x1d3000 r--p /usr/lib/libc.so.6
0x7ffff7bf6000 0x7ffff7bf8000 0x2000 0x1d7000 rw-p /usr/lib/libc.so.6
0x7ffff7bf8000 0x7ffff7c00000 0x8000 0x0 rw-p
0x7ffff7c00000 0x7ffff7c9c000 0x9c000 0x0 r--p /usr/lib/libstdc++.so.6.0.32
0x7ffff7c9c000 0x7ffff7dd3000 0x137000 0x9c000 r-xp /usr/lib/libstdc++.so.6.0.32
0x7ffff7dd3000 0x7ffff7e69000 0x96000 0x1d3000 r--p /usr/lib/libstdc++.so.6.0.32
0x7ffff7e69000 0x7ffff7e76000 0xd000 0x269000 r--p /usr/lib/libstdc++.so.6.0.32
0x7ffff7e76000 0x7ffff7e77000 0x1000 0x276000 rw-p /usr/lib/libstdc++.so.6.0.32
0x7ffff7e77000 0x7ffff7e7b000 0x4000 0x0 rw-p
0x7ffff7e81000 0x7ffff7e85000 0x4000 0x0 rw-p
0x7ffff7e85000 0x7ffff7e89000 0x4000 0x0 r--p /usr/lib/libgcc_s.so.1
0x7ffff7e89000 0x7ffff7ea4000 0x1b000 0x4000 r-xp /usr/lib/libgcc_s.so.1
0x7ffff7ea4000 0x7ffff7ea8000 0x4000 0x1f000 r--p /usr/lib/libgcc_s.so.1
0x7ffff7ea8000 0x7ffff7ea9000 0x1000 0x22000 r--p /usr/lib/libgcc_s.so.1
0x7ffff7ea9000 0x7ffff7eaa000 0x1000 0x23000 rw-p /usr/lib/libgcc_s.so.1
0x7ffff7eaa000 0x7ffff7eba000 0x10000 0x0 r--p /usr/lib/libm.so.6
0x7ffff7eba000 0x7ffff7f39000 0x7f000 0x10000 r-xp /usr/lib/libm.so.6
0x7ffff7f39000 0x7ffff7f95000 0x5c000 0x8f000 r--p /usr/lib/libm.so.6
0x7ffff7f95000 0x7ffff7f96000 0x1000 0xea000 r--p /usr/lib/libm.so.6
0x7ffff7f96000 0x7ffff7f97000 0x1000 0xeb000 rw-p /usr/lib/libm.so.6
0x7ffff7f97000 0x7ffff7f99000 0x2000 0x0 rw-p
0x7ffff7fc4000 0x7ffff7fc8000 0x4000 0x0 r--p [vvar]
0x7ffff7fc8000 0x7ffff7fca000 0x2000 0x0 r-xp [vdso]
0x7ffff7fca000 0x7ffff7fcb000 0x1000 0x0 r--p /usr/lib/ld-linux-x86-64.so.2
0x7ffff7fcb000 0x7ffff7ff1000 0x26000 0x1000 r-xp /usr/lib/ld-linux-x86-64.so.2
0x7ffff7ff1000 0x7ffff7ffb000 0xa000 0x27000 r--p /usr/lib/ld-linux-x86-64.so.2
0x7ffff7ffb000 0x7ffff7ffd000 0x2000 0x31000 r--p /usr/lib/ld-linux-x86-64.so.2
0x7ffff7ffd000 0x7ffff7fff000 0x2000 0x33000 rw-p /usr/lib/ld-linux-x86-64.so.2
0x7ffffffde000 0x7ffffffff000 0x21000 0x0 rw-p [stack]
0xffffffffff600000 0xffffffffff601000 0x1000 0x0 --xp [vsyscall]
(gdb) x/gx 0x555555554000+0x3da0
0x555555557da0 <_ZTI10Interface1>: 0x00007ffff7e6ac90
(gdb) x/gx *(uint64_t *)(0x555555554000+0x3da0)
0x7ffff7e6ac90 <_ZTVN10__cxxabiv117__class_type_infoE+16>: 0x00007ffff7cae8a0
As you can see the address of 0x3da0, after the relocation, points to 0x7ffff7e6ac90 (aka _ZTVN10__cxxabiv117__class_type_infoE+16
) that lives inside the read-only section of /usr/lib/libstdc++.so.6.0.32
.
As you have pointed out with the red/green arrows, right now BN will consider the offset r_addend
as being applied immediately, hence the relocated address becomes 0x4030 + 0x10 = 0x4040
that is _vtable_for___cxxabiv1::__vmi_class_type_info
that is the symbol that just so happen to be present at that address, but it has nothing to do with the current relocation. Any symbol might have been present at 0x4040, even from a different imported library.
It's not trivial to figure out what is the best way of handling offsets in relocations (see the same problem in ghidra https://github.com/NationalSecurityAgency/ghidra/issues/2261) because there's no reliable way of knowing what will be present inside a library at address symbol+offset
.
Imho it is better to just leave it out for the reverser, so display it with the explicit offset, like _vtable_for___cxxabiv1::__class_type_info + 0x10
.
It's important to understand though that _vtable_for___cxxabiv1::__class_type_info + 0x10
is different from _vtable_for___cxxabiv1::__vmi_class_type_info
.
Version and Platform (required):
Bug Description: Address pointing to the incorrect external symbol due to the wrong interpretation of the relocation data. Steps To Reproduce: Consider the following c++ code (the executable is attached here test.tar.gz EDIT file updated)
Binary Ninja correctly identifies the RTTI structures for the classes
Interface1
andInterface2
. Now consider the variable_typeinfo_for_Interface1
(address 0x3da0 in the binary attached): BN doesn't assign a meaningful data type to that variable but in fact it is pointer to astd::type_info
derived class (for exampleabi::_class_type_info
orabi::__vmi_class_type_info
) that is an external symbol imported from the standard c++ library.There is an entry in the
.rela.dyn
section for that address, relocating it to the right symbol in libc++.so:_ZTVN10__cxxabiv117__class_type_infoE@CXXABI_1.3 + 10
(notice there is an offset of 0x10 to the symbol, that is necessary by following c++ Itanium ABI).By changing the type of
_typeinfo_for_Interface1
tovoid *
we can see that the address being pointed is not the correct one (in this case_ZTVN10__cxxabiv117__class_type_infoE@CXXABI_1.3 + 0x10
) but instead we see a different one (in the binary attached it is_vtable_for___cxxabiv1::__vmi_class_type_info
).My guess is that the offset of 0x10 is applied at the wrong moment (for example in the exported symbols table of the library or as a raw offset in the binary itself?) resulting in a wrong function being imported.
Moreover there is no API (python or C++) that lets you access the
BinaryNinja::Relocation
objects. Supposedly, there should beBinaryNinja::Segment::GetRelocationsInRange
that should return a vector of relocations but even though it is listed in binaryninjaapi.h, the symbol is not exported in the binary itself.Exposing such method could at least mitigate this problem as I could always "relocate" manually the addresses.