Open mthebridge opened 7 years ago
Don't be so humble! You've even found the faulty commit! That said, it's a bit surprising that this particular commit causes a crash: it shouldn't change breakpoint setting very much.
Do you use the #180 feature? I.e., via LCOV_EXCL_START?
Is this consistent across every run, or does it sometimes work?
Thanks for the speedy reply! It doesn't matter whether I'm using LCOV_EXCL_START
or not, and yes it's 100% consistent - always fails on master, always works with v33...
And the commit immediately before e620da4 (i.e., 860a324) always work? Sorry for doubting git bisect a bit ;-)
Yes. I just manually checked out those two revs and compared.
On e620da4:
$ cargo kcov
error: failed to get coverage
caused by: signal: 11
On 860a324:
$ cargo kcov
kcov: 0 invalid breakpoints skipped...
<snipped>
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured
OK, great!
If you run kcov with --debug=15, you should be able to see how/if the breakpoint setting changes between the two versions. I think there must be something wrong related to breakpoint setting, so that a breakpoint gets misplaced (i.e. what --verify works around) and therefore causes the crash.
The output will be quite long, but should be possible to pass to diff.
Hmm - if I set debug=15 then both cases fail! The previously working case gives me:
$ cargo kcov -- --debug=15 > kcov_debug.out
error: failed to get coverage
caused by: exit code: 255
And the end of the debug output for this case is:
PT stopped PID 8671 0x0000057f
PT BP at 0x7ffff71050cd:-1 for 8671
REPORT hit at 0x7ffff71050cd
PT continuing 8671 with signal 0
PT stopped PID 8671 0x0000057f
PT BP at 0x7ffff71050db:-1 for 8671
REPORT hit at 0x7ffff71050db
PT continuing 8671 with signal 0
PT stopped PID 8671 0x00000b7f
PT signal 11 at 0x7ffff71050da for 8671
PT continuing 8671 with signal 11
PT stopped PID 8671 0x0000000b
PT terminating signal 11 at 0x7ffff71050da for 8671
PT continuing 8671 with signal 0
PT error for 8671: -1
Returning error
The previously broken case has 7000 lines of debug output, the previously working case 220000 - I assume you don't want to see 200k lines of diff! The 7000 lines in the "broken" case are the same as the first 7000 in the "working" case, but for the "broken" one the output just stops mid-line. No obvious errors in the "broken" version.
OK, strange. That almost sounds like kcov itself is crashing (as in #207) if it stops mid-line. Do you get a core file for kcov in that case?
I'm busy with real-life things right now (repainting a room), but I'll take a deeper look at the commit after that to see if I can figure out why it breaks things.
No problem about the delay - enjoy your redecorating!
You're right - running it with debug causes kcov itself to crash, and I do get a core.
Output from loading in gdb (running kcov at e620da4, Linux x86_64):
$ gdb `which kcov` core.7908
<snip>
Core was generated by `kcov --debug=15 /data/mlt/rustproj/slog-extlog/slog-extlog-derive/target/cov/sl'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f7e4b234295 in std::string::find(char const*, unsigned long, unsigned long) const () from /lib64/libstdc++.so.6
(gdb) bt
#0 0x00007f7e4b234295 in std::string::find(char const*, unsigned long, unsigned long) const () from /lib64/libstdc++.so.6
#1 0x0000000000430fd1 in find (__pos=0, __str="LCOV_EXCL_LINE", this=0x0) at /usr/include/c++/4.8.2/bits/basic_string.h:1848
#2 match (line=<error reading variable: Cannot access memory at address 0x0>, lineNr=<optimized out>, filePath=..., this=0x24eb0d0) at /data/mlt/kcov/src/filter.cc:144
#3 Filter::runLineFilters (this=<optimized out>, filePath=..., lineNr=<optimized out>, line=<error reading variable: Cannot access memory at address 0x0>) at /data/mlt/kcov/src/filter.cc:107
#4 0x000000000043dcca in Reporter::onLine (this=0x24eae90, file="/checkout/src/libcore/lib.rs", lineNr=1, addr=93824992326304) at /data/mlt/kcov/src/reporter.cc:328
#5 0x0000000000450892 in ElfInstance::onLine (this=0x85c020 <g_instance>, file=..., lineNr=1, addr=94880) at /data/mlt/kcov/src/parsers/elf-parser.cc:692
#6 0x0000000000453d7d in kcov::DwarfParser::forEachLine (this=this@entry=0x7fff4bebd430, listener=...) at /data/mlt/kcov/src/parsers/dwarf.cc:112
#7 0x00000000004509b0 in ElfInstance::parseOneDwarf (this=this@entry=0x85c020 <g_instance>, relocation=relocation@entry=93824992231424) at /data/mlt/kcov/src/parsers/elf-parser.cc:420
#8 0x0000000000452b2d in ElfInstance::doParse (this=0x85c020 <g_instance>, relocation=93824992231424) at /data/mlt/kcov/src/parsers/elf-parser.cc:305
#9 0x000000000045535b in checkSolibData (this=0x24eb9d0) at /data/mlt/kcov/src/solib-handler.cc:228
#10 SolibHandler::onTick (this=0x24eb9d0) at /data/mlt/kcov/src/solib-handler.cc:78
#11 0x000000000041ab0c in tick (this=0x24eb220) at /data/mlt/kcov/src/collector.cc:72
#12 Collector::run (this=0x24eb220, filename=...) at /data/mlt/kcov/src/collector.cc:53
#13 0x0000000000418d6a in main (argc=<optimized out>, argv=<optimized out>) at /data/mlt/kcov/src/main.cc:298
Looks like a null pointer dereference to me - seems that FileLineHandler::match
doesn't cope with being passed a null
string for line
. Do shout if you want me to send you the core (suspect it;s too large to atatch here).
Yes, that sounds like it!
Is the /checkout/src/libcore/lib.rs present in your system?
Anyway, with that finding, it should be pretty easy to fix the problem. Good catching!
Hmm... match takes a reference for line, so it really shouldn't be null. Could you mail me the source code of lib.rs? simon.kagstrom@gmail.com is the address.
I'll see if I can reproduce this in a unit test, although I'll have to warn you that it might be a few days until I look at this again.
Well, libcore/lib.rs
is from the core Rust library, so not something I have my own copy of. I imagine it's this file though!
Well, something is strange if the source code isn't present on your system: Then it should not be able to read the file to start with. I've read the code of source-file-cache.cc again, but I can't see that it should be possible to end up in this situation.
Could you mail me the core file?
Thanks Simon - I've sent you a mail.
Unfortunately, I guess I would need your kcov binary as well (matching the core file), I don't get any symbols in the backtrace. One other thing though: Could you build kcov in debug-mode, i.e., configure it with
cmake -DCMAKE_BUILD_TYPE=Debug <...>
? Perhaps the backtrace would be easier to follow in this case. At least the "optimized out" stuff should be gone then.
The debug build is a good call - I may have got somewhere with that:
(gdb) up 3
#3 0x00000000004f2bfb in Reporter::onLine (this=0x1e14f20, file="/checkout/src/libcore/lib.rs", lineNr=1, addr=93824992326304) at /data/mlt/kcov/src/reporter.cc:328
328 !m_filter.runLineFilters(file, lineNr, lines[lineNr - 1]))
(gdb) p file
$10 = "/checkout/src/libcore/lib.rs"
(gdb) p *fp
$11 = {m_fileHash = 6018595602966914235, m_lines = std::vector of length 2, capacity 2 = {0x0, 0x2a451c0}, m_nrLines = 1}
(gdb) p *line
$12 = {m_addrs = std::vector of length 0, capacity 0, m_lineId = 2419036335428861953, m_order = 0, m_unreachable = false}
(gdb) p lines
$13 = std::vector of length 0, capacity 0
I wonder if the fact that file
doesn't actually exist at the path specified is the problem? The /checkout/src/libcore
is part of the Rust compiler code itself, not my code - but my Rust library does have a compiler plugin.
Looks like if that is the case, then the call to ISourceFileCache::getInstance().getLines(file)
in reporter.cc
is going to return an empty vector...
Well, it should return an empty vector, but then the number of lines should be zero, and not 1 as in this case. I think that's probably the root of the problem.
Not at the computer now, but I think the m_empty member in source-file-cache.cc maybe isn't set up correctly.
OK, so this issue with the null reference is a bit of a red herring. :(
The test I see at that commit is:
if (lines.size() >= lineNr - 1 &&
!m_filter.runLineFilters(file, lineNr, lines[lineNr - 1]))
And lines.size()
will be 0, but lineNr
is 1. If I change the code to explicitly also cover the case when `lines.size() is 0, then everything passes...
BUT you've fixed this in master, by switching to a for loop:
for (unsigned int nr = 1; nr <= lines.size(); nr++) {
if (!m_filter.runLineFilters(file, lineNr, lines[nr - 1]))
And running from the master, kcov itself doesn't segfault, the instrumented binary does.
However I switched back to running with --debug=15 and that fails even back obn v33!
So I suspect that this second bug isn't a regression, it's always been a problem, but somehow before the introduction of the line filters it only manifests with debug tracing on.
The end of the debug trace shows:
REPORT hit at 0x7ffff71050db
PT continuing 13400 with signal 0
PT stopped PID 13400 0x00000b7f
PT signal 11 at 0x7ffff71050da for 13400
PT continuing 13400 with signal 11
PT stopped PID 13400 0x0000008b
PT terminating signal 11 at 0x7ffff71050da for 13400
And 0x7ffff71050db is inside malloc()
...
OK, too bad - I had forgot that change myself. Sounds like some kind of race then, but difficult to say where it might be.
Things to test:
Run kcov with --skip-solibs
Does the crashing code have source code? (I guess the one calling malloc from Rust libs) If so, maybe try filtering it out with --exclude-path
Is the crash the same with --debug=15
for both v33 and master?
skip-solibs
doesn't crash, but that's because of how Rust test binaries are built - by default everything is a library, and the test harness builds a minimal wrapper around it, so while it doesn't crash the coverage report is empty.
Yes, I get the same problem on master and v33.
Using --exclude-path
makes no difference, but using --include-path $(pwd)
to only cover my code DOES work. That's a workaround I can use easily enough - so if you want to close this issue, then go ahead!
One other thought - it's possible that this is actually a Rust bug of course, and rustc is generating bad debugging information in the binary...
Well, normally you'd want to only cover your own code anyway, but it should still work I think.
Does --verify
report any skipped breakpoints?
Sorry for delayed replay - --verify
does report skipped breakpoints, but that happens for all Rust crates. I wonder if this is just another symptom of https://github.com/rust-lang/rust/issues/27324 ?
OK, good. At least that's working as it should then.
It almost sounds like a timing issue I'd say. kcov (as you've no doubt seen!) tends to slow down execution quite a bit, and maybe that triggers some race somewhere?
Hi, just dropping by to say that I'm too affected by this. Looks like rustc bump (1.19.0 → 1.20.0) broke things completely:
Last working build: https://travis-ci.org/zetok/tox/jobs/270343484#L2215 Build with segfault: https://travis-ci.org/zetok/tox/jobs/271103907#L2367
Does the workaround @mthebridge uses (--include-pattern=<path-to-your-own-source-code>
) help for you as well?
@SimonKagstrom That works, thanks!
One other thought - it's possible that this is actually a Rust bug of course, and rustc is generating bad debugging information in the binary...
FWIW, I dealt with a similar situation with Rust code in Firefox, and even contributed a patch to rustc to make things better. Generics in Rust do not get actually compiled until you instantiate them with a concrete type, and the Rust standard library has a lot of generics. This means that when you use like Option<Foo>
, where Foo
is defined in your code, the Rust compile will compile that object code into your binary when your crate is compiled. Additionally, since Rust doesn't support dynamic linking, the rest of the standard library will be statically linked into your final binary. This means that there are a lot of things that wind up in your binary for which the source paths will map back to the Rust standard library in whatever path it was built for the Rust toolchain you're using, and you are not going to have source for those.
It's possible to encounter this in C code, but less likely due to the fact that many C libraries are dynamically linked. If you were to link an external static library that contained debug info I'm pretty sure you'd hit the same issue, though.
Thanks for the insight into rust!
Still, I don't think the missing source code as such is really the problem - that's quite common for C/C++ as well, and present no problem by itself. But the volume of instrumented code might certainly increase the chances of triggering a case of bad debug info.
This problem would have been so much simpler if x86 - with variable instruction lengths - weren't so prevalent.
I hitted by this issue today.
The steps to reproduce are simple:
$ cargo new --lib foo && cd foo && cargo test
...
$ export TEST_BIN=`find target/debug/ -executable -type f | grep 'target/debug/foo'`
$ /tmp/kcov/build/src/kcov target/cov $TEST_BIN
kcov: Process exited with signal 11 (SIGSEGV) at 0x5555555b9497
version of software that I use:
$ cargo --version
cargo 0.26.0-nightly (1d6dfea44 2018-01-26)
$ /tmp/kcov/build/src/kcov --version
kcov v34-25-g894e
$ uname -a
Linux retina 4.14.22-1-lts #1 SMP Sun Feb 25 15:13:49 CET 2018 x86_64 GNU/Linux
@Dushistov try running kcov with --verify
, it seems to be needed pretty much always with rust binaries.
@SimonKagstrom
Sorryk forgot to mention, I've tried both variants, but I hope that and without any flag it should not crash.
$ /tmp/kcov/build/src/kcov --verify target/cov target/debug/foo-7cf2e66c10f1fab7
kcov: 213 invalid breakpoints skipped in /tmp/foo/target/debug/foo-7cf2e66c10f1fab7
running 1 test
test tests::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
$ /tmp/kcov/build/src/kcov --include-path . target/cov target/debug/foo-7cf2e66c10f1fab7
running 1 test
test tests::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
So it works with --verify
and limiting the source path? Issue #82 explains a bit more in detail why verify is needed, it's basically a workaround for broken DWARF info. Limiting the source path probably means that the locations where the DWARF info points to the middle of instructions are simply ignored.
I agree it should work without any extra options, but since this appears to be quite common (not only with Rust), the only alternative would be to make --verify
default. That would slow down execution fairly significantly though for the cases where it's not needed.
Perhaps a warning noting that if it's not needed --noverify
can be passed?
The segfault occurs when you don't have libiberity-dev
in your packages. To fix it:
addons:
apt:
update: true
- libiberity-dev
For some reason, kcov can build without libiberity, but the segfaults at runtime.
@fschutt kcov detects if binutils-dev
is installed, and only then builds with disassembly, so it's optional.
It gives a warning though if you run with --verify
without the disassembler. Rust binaries appear to be especially sensitive to this, so they typically don't work at all without discarding the invalid breakpoints.
@Arzte running without --verify
is the default, so if you don't give any options it will run without verification. Sorry about not seeing your reply until now!
I'm hitting an issue where running a Rust test binary under kcov segfaults, but running the binary natively is fine:
Obviously due to the way kcov works I can't run it in a debugger, but if I dump a core and load that in gdb I can see the segfault is somewhere in jemalloc in thread initialization.
Unfortunately I can't seem to generate a simple example for repro, and I can't share the underlying code, but running under kcov v33 works as expected, indicating this is a regression - and indeed
git bisect
found that the commit at fault is e620da497dfcf93eefef6e582c68b68b9303f84c, which fixed #180. Annoyingly, the reason I want to run off master and not from the v33 release is because I want to be able to exclude lines within my project!I appreciate there's not a lot to go on here - let me know if there's any useful diagnsotics I can send!