Closed bossmc closed 1 year ago
I also considered (and in fact, mostly implemented, before realising this simpler change) making the BFD parser track sections at their loaded addresses rather than at their paddr
offsets. Since the loader will never [citation needed] overlap sections this would allow the parser to load each ELF in any order and to validate addresses/look up basic-blocks from any ELF in any order too without risk of bleed between ELF images.
This change is more complex (not much more but some), and is very dependent on the solib
shim being available and in use (otherwise we can't work out the load addresses). It also requires the solib
shim reporting more than the relocation base of the main binary (it has to report each segment's load address) which looks like it might be awkward for e.g. static binaries.
If you're interested I can tidy up the prototype and share it with you?
Wow, awsome work!
As for the alternative implementation, I think I prefer this one. I regularly use --skip-solibs
to avoid the LD_PRELOAD:ed solib parser, as it doesn't work together with ASAN.
I saw that there is a missing include in bfd-disassembler.cc:
/home/runner/work/kcov/kcov/src/parsers/bfd-disassembler.cc: In member function ‘BfdDisassembler::Section* BfdDisassembler::lookupSection(uint64_t)’: /home/runner/work/kcov/kcov/src/parsers/bfd-disassembler.cc:328:36: error: ‘logic_error’ is not a member of ‘std’ 328 | throw std::logic_error("Section search invariant broken");
Correct, I'd been working on the fix in a docker container (where I had a binary that was reproducing this) and, when I copied the changes out to my git workspace to create this PR, I didn't notice that cmake
had failed to find libbfd
and thus didn't compile in the bfd-disassembler
... I've added the missing header now.
I think the alternative implementation would work with --skip-solibs
since it would load the main binary as if it were loaded at address 0x0, but since there are no additional SOs to load, there would be no risk of overlapping sections. OTOH if you're happy with this fix I'm fine not to bring in the additional complexity.
Base: 63.97% // Head: 63.97% // Increases project coverage by +0.00%
:tada:
Coverage data is based on head (
0b8d733
) compared to base (59bd329
). Patch coverage: 57.14% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Thanks a lot for this fix! Very good work and I appreciate the detailed analysis of the problem and your solution!
No worries, thanks for kcov! Are you planning to cut a new release soon? It would be nice to be able to use an official release if we can.
I'll make a release with this feature shortly then!
At a high-level, this makes the BFD disassembler forget all state from any previous ELF file that it's parsed when it's
setup
with a new one. This prevents:This "forget all previous state" doesn't feel very user-friendly (it means the caller has to do everything it wants to with one ELF before moving on to the next one) but it's consistent with how the
elf-parser
code works.Background
I was hitting https://github.com/SimonKagstrom/kcov/issues/102 when covering some (very simple) rust code on Ubuntu Bionic. The bad instruction was occurring in exactly the place mentioned in that issue, in
__pthread_enable_asynccancel
fromlibpthread.so
, where the ASM source code looks like:Note that the
lock
prefix is on a separate line to thecmpxchgl
(even though they're two parts of the same operation) andgcc
andclang
both helpfully insert separate debug info for those two lines, one pointing at thelock
prefix opcode and the other pointing at the rest of the instruction.The BFD disassembler can correctly see that the
cmpxchgl
is not a valid location to place anint3
since it's not the start of an opcode, so normally this debug address is rejected as an invalid address (at least with--verify
). Despite this I was seeing thecmpxchg
being replaced withint3
leading toSIG_ILL
s.What was happening was that a previously inspected SO (possibly the original binary?) had included a section that started after
libpthread
's.text
section, but ended before the offset of the code related to__pthread_enable_asynccancel
. When the BFD parser was asked if thecmpxchg
address was valid, it attempted to find the section that contained it, but the section searching code assumes (sensibly) that sections can't overlap, so it determined that the only section that could contain the address was the one left over from the earlier ELF. This section didn't extend far enough to cover the address so the BFD parser determined that the searched for address wasn't in a known section, and hence reported it as "valid" (for safety I guess?). In another universe the address might have been found in a section but might have been marked asvalid
since a previous ELF had a valid instruction starting at that offset which would have led to the same end behaviour.This seems to be incredibly hard to reproduce minimally (maybe by writing a custom linker script to lay out the sections manually?) and I imagine that the next release of libc/libpthread/rustc/ld/anything would have re-hidden the problem. But better to fix it once and for all now that I've dug into it.
Additional Changes
I also noticed and fixed up two other issues in the BFD parser code:
jczx
,jezx
andloop
as branching instructions (instead looking for ajczxjezxloop
opcode)upper_bound
would returnend()
and the old code would bail out there)