Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

LLD's termination code does not allow code coverage analysis to work properly. #37077

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR38104
Status NEW
Importance P enhancement
Reported by George Rimar (grimar@accesssoftek.com)
Reported on 2018-07-09 07:30:21 -0700
Last modified on 2018-07-09 08:03:50 -0700
Version unspecified
Hardware PC Windows NT
CC llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments exit_race_condition.png (232658 bytes, image/png)
Blocks
Blocked by
See also
Created attachment 20542
callstack

I am not sure we can call it a bug of LLD, though it seems at least something
worth to document. (And maybe to improve somehow).

Currently, LLD uses _exit for error() and fatal() calls:
https://github.com/llvm-mirror/lld/blob/master/Common/ErrorHandler.cpp#L60

That does not allow to test code coverage properly sometimes.

Imagine the following code:

#include <stdlib.h>
#include <unistd.h>

__attribute__((noreturn)) void fatal() {
  exit(0);
}

int main() {
  fatal();
  return 0;
}

If we compile and run it as
clang++ test.c --coverage -fPIC -std=c++11 -fprofile-arcs -ftest-coverage -o
test
./test
We'll see the .gcno and .gcda created and will be able to generate the coverage
report:
~/LLVM/build/bin/llvm-cov gcov test.gcda

But if we change "exit" to "_exit", then no .gcda file is created after running
the app.
That happens because _exit does not do enough cleaning, buffers flushing etc.

The same happens with LLD. If I change "_exit" to "exit" at a mentioned line,
it allows obtaining the correct reports about code coverage.

There is a problem about race conditions though. When using exit() LLD hangs
during
running the tests when multithreading is enabled. I was able to catch
and debug it (reproduces on invalid-cie-length.s test case for me).
Screenshot attached shows how 2 threads are stuck in destructors and waiting
for something.

The issue happens in the following code:
https://github.com/llvm-mirror/llvm/blob/master/lib/Support/Parallel.cpp#L94
At line 103 we call the Task() functor. Here Task() calls fatal() inside, which
is "noreturn" and
calls _exit originally. After changing _exit() to exit(), the code starts to
execute static destructors
and we stuck because of that in the ~ThreadPoolExecutor which waits for ~Latch
infinitely
because line 105 ("Done.dec();") is never executed.

The case above happens because we call fatal() in the parrallel_foreach thread.
That particular fatal() is coming from EhFrame parser:
https://github.com/llvm-mirror/lld/blob/master/ELF/EhFrame.cpp#L46
It should be possible to workaround that race condition if we will use error()
instead of fatal(), which usually does not exit immediately.
It might be enough for building coverage report purposes.
Quuxplusone commented 6 years ago

Attached exit_race_condition.png (232658 bytes, image/png): callstack

Quuxplusone commented 6 years ago
And seems workable fix/workaround is to add __gcov_flush before _exit():

extern "C" void __gcov_flush();
...
void lld::exitLld(int Val) {
  ...
  __gcov_flush();
  _exit(Val);
}