Open GoogleCodeExporter opened 9 years ago
If you could pass along the test program that would be great
(chappedm@gmail.com).
Original comment by chapp...@gmail.com
on 10 Mar 2013 at 8:43
I think I have an explanation of this issue, along with a workaround or two.
The issue is that we use sys_clone() to clone the ListerThread in the same
address space as the triggering thread, and sharing the same thread-local
storage. This usually works out OK, but there's a race condition here in the
dynamic loader (dl_runtime_resolve in the stacks above). If the triggering
thread is doing a resolution (eg of sys_waitpid) at the same time as the
ListerThread is doing a resolution (eg of sem_wait), then the two can conflict
and end up triggering this issue.
There's some more gory detail here:
http://www.mail-archive.com/utrace-devel@redhat.com/msg01944.html
I have two workarounds that appear to work:
1) set LD_BIND_NOW=1 in the environment. This forces all dynamic loading to
happen ahead of time, so you can't hit this race.
2) make some "junk calls" of the relevant library functions before doing the
sys_clone:
Index: src/base/linuxthreads.cc
===================================================================
--- src/base/linuxthreads.cc (revision 208)
+++ src/base/linuxthreads.cc (working copy)
@@ -614,6 +614,16 @@
* to ListerThread actually executing
*/
if (sem_init(&lock, 0, 0) == 0) {
+ // Workaround for issue #497: call various functions before cloning.
+ // his ensures that they are already loaded, since calling the dynamic
+ // loader from our CLONE_VM thread can race against the dynamic loader
+ // in the enclosing thread.
+ sem_post(&lock);
+ sem_wait(&lock);
+ prctl(PR_GET_DUMPABLE, 0, 0, 0, 0);
+ // First do a junk waitpid to trigger dynamic loading of waitpid
+ int junk;
+ sys0_waitpid(getpid(), &junk, WNOHANG);
int clone_errno;
clone_pid = local_clone((int (*)(void *))ListerThread, &args);
You may not need all of the above junk calls -- perhaps just the junk
sys0_waitpid, so that the waitpid from the triggerer doesn't cause any dynamic
loading. But with the above in place, I can't seem to repro the issue.
Original comment by tlip...@gmail.com
on 25 Mar 2013 at 9:39
Also, it's likely this only affects some applications -- I think glibc will
only do the SSE register saveoff/restore when it sees that there is at least
one function using SSE registers as part of its calling convention (the
application I'm testing does use SSE)
Original comment by tlip...@gmail.com
on 25 Mar 2013 at 9:44
Original issue reported on code.google.com by
tlip...@gmail.com
on 10 Feb 2013 at 6:28