cuitao2046 / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Test for distinct address spaces is broken #481

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What version of the product are you using? On what operating system?

32-bit Linux, tcmalloc 2.0, GCC 4.7.1, eglibc 2.13

Please provide any additional information below.

Whilst trying to debug a crash in heap-checker_unittest.sh, I came across some 
very strange code in src/base/linuxthreads.cc, which attempts to detect whether 
a related process is a fork or a thread by trying to determine whether the 
address spaces are distinct.  The offending code is as follows (lines 446-458):

  if (sys_ptrace(PTRACE_PEEKDATA, pid, &i, &j) || i++ != j ||
      sys_ptrace(PTRACE_PEEKDATA, pid, &i, &j) || i   != j) {
    /* Address spaces are distinct, even though both
     * processes show the "marker". This is probably
     * a forked child process rather than a thread.
     */
    sys_ptrace_detach(pid);
    num_threads--;
    sig_num_threads = num_threads;
  } else {
    found_parent |= pid == ppid;
    added_entries++;
  }

As far as I can tell, what this is trying to do is peek at the other process's 
address space to see if the memory at &i is the same as in the current process, 
then frob i, and see whether &i in the other process changes as well.  This is 
broken for two reasons:

1. It appears to assume that the PTRACE_PEEKDATA will load the data at the 
address in the 3rd argument, into the variable given as the 4th argument, and 
return 0 on success (which will evaluate to false, and continue evaluating the 
statement after the ||).  This is not true - according to the man page, the 4th 
argument is ignored, and the *return value* is the data, with errno used to 
indicate success/failure.

2. Even if the PTRACE_PEEKDATA calls are fixed, the test can ever work as 
desired, since i is a local variable and stack variables are always 
thread-local.  The data at address &i in a related process is not guaranteed to 
be equal to i, regardless of whether it is a fork or a thread.

As far as I can tell, the only reason it "works" is because the PTRACE_PEEKDATA 
calls are failing, returning zero (which evaluates to false), and setting errno 
to EFAULT (14), which is then reflected in the output from my failed test run:

  In main(): heap_check=local
  Thread finding failed with -1 errno=14
  Thread finding callback was interrupted or crashed; can't fix this
  ----
  FAIL: heap-checker_debug_unittest.sh

Please find attached a patch which uses a different approach, based on using 
the __WCLONE option with waitpid().  From my testing, this appears to detect 
relatives correctly, including the distinction between forks and processes, 
since the parent process - which the code checks for as a sanity-check - is a 
"real" process, not a clone.

Unfortunately, this patch has not fixed my unit test failure, since the crash 
really is within the callback, not the thread-finding routine.

Original issue reported on code.google.com by philip.a...@smoothwall.net on 1 Nov 2012 at 11:15

Attachments:

GoogleCodeExporter commented 9 years ago
r167 | chappedm@gmail.com | 2012-11-03 10:52:42 -0400 (Sat, 03 Nov 2012) | 4 
lines

issue-481: Replaced test mechanism for distinct address spaces with a more 
reliable mechanism

Rather than using sys_ptrace+PTRACE_PEEK_DATA to determine whether address 
spaces are distinct, we now use sys_waitpid+__WCLONE. See issue-481 for a more 
detailed rationale.

Original comment by chapp...@gmail.com on 3 Nov 2012 at 2:53

GoogleCodeExporter commented 9 years ago

Original comment by chapp...@gmail.com on 3 Nov 2012 at 2:53

GoogleCodeExporter commented 9 years ago
I think this introduced the following race condition which I can reproduce 
occasionally trying to use ListAllProcessThreads against a single-threaded 
application:

- the directory scanner finds the main thread of a single-threaded program (ie 
not a clone)
- it PTRACE_ATTACHes to this main thread, causing it to asynchronously stop.
- the new wait call here, with __WCLONE, will return immediately, because the 
parent pid is not a clone. So, it returns immediately even though the process 
is not yet stopped.
- we finish the iteration over threads, and since we only found the parent, we 
go back and try to do the /proc/* based matching. In the retry step, we first 
try to PTRACE_DETACH from the parent. But, if the async STOP hasn't been 
delivered yet, this DETACH will fail.
- on the next loop through, our attempted call to PTRACE_ATTACH now fails 
because we're already attached.

This ends up causing the attached parent to be orphaned, and the process ends 
up SIGSTOPPED when the tracer process exits.

The fix is to do something like:

                int waited_pid;
                bool is_parent = (pid == ppid);
                int flags = is_parent ? __WALL : __WCLONE;

                while ((waited_pid = sys_waitpid(pid, &status, flags)) < 0) {
                  if (errno != EINTR) {
                    sys_ptrace_detach(pid);
                    num_threads--;
                    sig_num_threads = num_threads;
                    goto next_entry;
                  }
                }
                found_parent |= is_parent;

Original comment by tlip...@gmail.com on 27 Nov 2013 at 10:04

GoogleCodeExporter commented 9 years ago
Have you tried on master or 2.{0,1} ? In master this ticket's change was 
reverted.

Original comment by alkondratenko on 27 Nov 2013 at 10:05

GoogleCodeExporter commented 9 years ago
Ah, I was looking at svn trunk - didn't catch that the project moved to git. 
Thanks for noting that this was reverted.

Original comment by tlip...@gmail.com on 3 Dec 2013 at 10:08