Closed roypat closed 1 year ago
Thanks for the report, and I really appreciate the work you put behind reproducing it! Unfortunately, I think this is really more of a compiler bug than for kcov, although kcov tries to work around it. Since this case has so many invalid instructions, perhaps there is some systematic issue with the dwarf generation that triggers it? Perhaps it would be a good idea to report it to the toolchain people then.
It would be slightly interesting to see the disassembly of the addresses where the crash occurs. kcov (via binutils libbfd) disassembles the binaries to try to avoid this case, but apparently it misses these places.
I haven't seen so many invalid breakpoints in so few lines before, so it's not strange in this case that coverage collection also breaks down as a consequence.
Thanks for the quick response! Yeah, I was afraid it would be a rustc issue, you're probably right that something goes fundamentally wrong with the generation of the debug symbols for crates linking against proc-macro
(we have another crate where kcov
reports thousands of invalid breakpoints). I'll try to investigate the generated debug symbols and open an issue on the rust-lang repository then!
I'm using the minimal binary produced by https://github.com/roypat/break-kcov for the following debug info:
Running kcov through gdb yields
Analysing the coredump the SIGSEGV produces with gdb yields
Running objdump -d
on the binary in question yields
0000000000000a94 <_fini>:
a94: 48 83 ec 08 sub $0x8,%rsp
a98: 48 83 c4 08 add $0x8,%rsp
a9c: c3 retq
My best guess would be that the breakpoint somehow gets placed in the data part of either the add
or the sub
instruction, which corrupts the stack by moving the stack pointer too much, leading to retq
to pop some nonsense address from the stack to jump back to?
Okay, should anyone else also run into this issue: when invoking rustc
through cargo
during cross-compilation, it ignores the value of the RUSTFLAGS
environment variable. This variable is used to pass the -Clink-dead-code
codegen option to rustc
, which tells it to not eliminate dead code (due to how proc-macros work in rust, this would happen quite extensively otherewise). This is why the coverage reports only include so few instrumented lines.
Furthermore, without this option rust seems to generate invalid DWARF
debug data in the sense that there's way to much of it, which is the reason for the kcov
warnings.
I am closing this issue, since it's not a kcov bug, but rather an unfortunate interaction between a cargo feature and seemingly a rustc bug.
Thanks a lot for the detailed analysis! I'm not a rust developer myself, but apparently many Rust users have these issues with kcov. I'll refer to your findings for hints on how to work around them - it sounds like this can be one of the root causes for many of these issues.
Hello,
We recently started investigating some CI failures on a pull request to firecracker. Particularly,
kcov
seems to exit withkcov: Process exited with signal 11 (SIGSEGV)
.We believe we managed to track down the issue to two culprits, the
x86_64-unknown-linux-musl
target that we build for, and the presence of procedural macros (there's the outliner of kcov also segfaulting inside the docker container we run our CI tests in, even with the toolchain specified as...-gnu
). Particularly, we managed to reproduce the segmentation fault on a different crate of ours which also provides procedural macros. On a fresh ubuntu 22.04.1 install that only has rust (with the musl target) and kcov installed, running the following commandsyields
Upon testing different procedural macro crates (the two in the firecracker PR, a minimal crashing example here and rocket) on the musl target, we observed that kcov doesn't always segfault, but it seems to always produce the invalid breakpoints, so these two things seem to be the common denominator here.
Running with
--debug=4
yields a bunch of statements of the formkcov: Address 0x19ab is not at an instruction boundary, skipping
, and running with--debug=15
yields output that ends withsimilar to #212 and #153. While the there proposed solution of limiting kcov to only run on the source directory, e.g.
prevents the segmentation fault from occuring, it does not fix the invalid breakpoints. The output becomes
This is an issue for us, since these invalid breakpoint cause the generation of incorrect coverage reports. Specifically, running the above command in the
src
directory ofversionize_derive
results in a coverage report that states 100% coverage, but only lists 17 lines as instrumented, even though the project is multiple hundred lines long.