SimonKagstrom / kcov

Code coverage tool for compiled programs, Python and Bash which uses debugging information to collect and report data without special compilation options
http://simonkagstrom.github.io/kcov/
GNU General Public License v2.0
720 stars 110 forks source link

False sh engine positives on elf binaries #339

Closed jwnimmer-tri closed 3 years ago

jwnimmer-tri commented 4 years ago

First off, thanks for the wonderful tool. We use it daily.

As I understand it, kcov uses EngineFactory with matchFile to heuristically infer which engine to use for a given coverage target. This causes a problem when the first 80 bytes of an ELF binary contain the characters sh:

https://github.com/SimonKagstrom/kcov/blob/c18c77531f3fc00440571a9a04dd33ee4fcd4c39/src/engines/bash-engine.cc#L404-L405

In a recent project of mine, I ended up with a C++ binary that was incorrectly detected as shell:

(focal)jwnimmer@cons:~/tmp/kcov-repro$ env LD_LIBRARY_PATH=. ./point_cloud_flags_test 
Using drake_cc_googletest_main.cc

[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from PointCloudFlagsTest
[ RUN      ] PointCloudFlagsTest.ConstExpr
[       OK ] PointCloudFlagsTest.ConstExpr (0 ms)
[ RUN      ] PointCloudFlagsTest.Basic
[       OK ] PointCloudFlagsTest.Basic (0 ms)
[----------] 2 tests from PointCloudFlagsTest (0 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (1 ms total)
[  PASSED  ] 2 tests.

(focal)jwnimmer@cons:~/tmp/kcov-repro$ env LD_LIBRARY_PATH=. kcov tmp ./point_cloud_flags_test 
kcov: warning: Other status: 0x85

Here is the header of the binary:

00000000: 7f45 4c46 0201 0103 0000 0000 0000 0000  .ELF............
00000010: 0300 3e00 0100 0000 800e 0400 0000 0000  ..>.............
00000020: 4000 0000 0000 0000 2873 6800 0000 0000  @.......(sh.....
00000030: 0000 0000 4000 3800 0900 4000 2900 2800  ....@.8...@.).(.
00000040: 0600 0000 0400 0000 4000 0000 0000 0000  ........@.......

Notice the "sh" bytes in the e_shoff section of the ELF header.

I have attached the binary (and its required shared library) here: kcov-repro.zip. This binary is intended to run on Ubuntu 20.04.

It's source code is at https://github.com/RobotLocomotion/drake/blob/04e5c5b014d5d621c4a1e445afbc37f2d2e59c58/perception/test/point_cloud_flags_test.cc, though the failure mode is very sensitive to the compilation flags being used.

Here is a link to my CI failure: https://drake-jenkins.csail.mit.edu/view/Production/job/linux-focal-gcc-bazel-nightly-coverage/153/ -- note that this data will expire within a few weeks.

I'd like to request a way to opt-out of the heuristic engine detection, on the command line. Either by disabling certain engines, or forcing one specific engine always.

I'm using kcov as provided by Ubuntu 20.04, at version 38+dfsg-1.

SimonKagstrom commented 4 years ago

This was an interesting one!

Yes, the type-of-engine check is fairly stupid. You've analysed the situation completely correct. I think, however, it should actually be better to check for the ELF magic at the start of the file instead, and return a high match value in that case. The ELF parser already does that, so something like

diff --git a/src/engines/ptrace.cc b/src/engines/ptrace.cc
index b702e8d..0737b8e 100644
--- a/src/engines/ptrace.cc
+++ b/src/engines/ptrace.cc
@@ -6,6 +6,7 @@
 #include <output-handler.hh>
 #include <solib-handler.hh>
 #include <file-parser.hh>
+#include <libelf.h>
 #include <phdr_data.h>
 #include "ptrace_sys.hh"

@@ -463,6 +464,13 @@ public:

        unsigned int matchFile(const std::string &filename, uint8_t *data, size_t dataSize)
        {
+               Elf32_Ehdr *hdr = (Elf32_Ehdr *) data;
+
+               if (memcmp(hdr->e_ident, ELFMAG, strlen(ELFMAG)) == 0)
+               {
+                       return match_perfect;
+               }
+
                // Unless #!/bin/sh etc, this should win
                return 1;
        }

Should do the trick. I'm by an OSX computer right now, so I haven't actually tested it, but I think it should work fine on both Linux and FreeBSD.

Forgot to mention: Good debugging, not easy to identify!

jwnimmer-tri commented 4 years ago

Agreed; that sounds better!

SimonKagstrom commented 3 years ago

I've pushed a fix for this with 4425833, so closing. Sorry for the immense delay.

SimonKagstrom commented 3 years ago

@jwnimmer-tri I also saw that I left a comment above unterminated so that it said "Good debugging, not!".

I've updated it with the full string so that it doesn't sound as an insult. I intended it as a compliment, sorry about that!

jwnimmer-tri commented 3 years ago

Thanks, I appreciate the fix!

(And yes, I took your debugging comment as a typo, with no ill will.)