caohaiwd / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

test suite failures on 1.5 and trunk, Mac OS X 10.6.3 #243

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. configure
2. make
3. make check

What is the expected output? What do you see instead?
0 tests should fail. Instead, 3 out of 38 tests fail.

What version of the product are you using? On what operating system?
Tried both 1.5 and trunk on Mac OS X 10.6.3.

Please provide any additional information below.

Running the 'heap-profiler_unittest' binary manually gives this output:

Starting tracking the heap
Had other new/delete MallocHook-s set. Are you using the heap leak checker? Use 
--
heap_check="" to avoid this conflict.
Abort trap

Original issue reported on code.google.com by neunon on 20 May 2010 at 11:15

Attachments:

GoogleCodeExporter commented 9 years ago
Interesting.  I actually have a 10.6.3 system to test on, and these tests pass 
just 
fine there.

Is there any way for you to run this in the debugger, and maybe put a 
breakpoint at 
the point in the code that sets the malloc hooks (in malloc_hook.h)?  Maybe 
that will 
give some insight as to what's going on for you.  You'll probably want to 
reconfigure 
with
   ./configure CXXFLAGS=-g

Original comment by csilv...@gmail.com on 20 May 2010 at 11:33

GoogleCodeExporter commented 9 years ago
For some reason, it passes all 38 tests with CXXFLAGS=-g. However, 
debugallocation_test throws up a crash 
dialog, despite passing all tests. The crash seems to only be triggered when 
running it through 
./debugallocation_test.sh (or via 'make check'). Running ./debugallocation_test 
manually does not crash. And the 
test passes, so that's weird. Stack trace attached.

I'm really confused why those three tests fail without -g, though...

Original comment by neunon on 21 May 2010 at 12:33

Attachments:

GoogleCodeExporter commented 9 years ago
Also tried a rebuild after doing 'ccache -zC'. I get the same weird result. 
CXXFLAGS=-g fixes everything but 
debugallocation_test, and things still fail without CXXFLAGS=-g.

Original comment by neunon on 21 May 2010 at 12:44

GoogleCodeExporter commented 9 years ago
} However, debugallocation_test throws up a crash dialog, despite passing all 
tests.

That's expected.  That test does several things which are supposed to crash.  
The 
driver .sh script tests that they crash as expected.

See what happens if you run with CXXFLAGS="-g -O1".  It seems to be some 
trouble due 
to optimization, but I'm not sure what it might be.

Original comment by csilv...@gmail.com on 21 May 2010 at 1:10

GoogleCodeExporter commented 9 years ago
The same tests fail with "-g -O1".

Original comment by neunon on 21 May 2010 at 5:38

GoogleCodeExporter commented 9 years ago
Good to know.  Is it possible for you to run one of the failing tests under 
gdb, and 
try to figure out where the code is setting the malloc hooks?  Hopefully it 
will be 
possible to get ok debugging results with -O1 (-O2 can be very difficult to 
debug).

Original comment by csilv...@gmail.com on 21 May 2010 at 6:02

GoogleCodeExporter commented 9 years ago
Not being familiar with the perftools internals, where would be the most 
revealing locations to set breakpoints? I 
assume MallocHook_SetNewHook and MallocHook_SetDeleteHook are the only places 
that ever set hooks for 
new/delete, right?

Original comment by neunon on 21 May 2010 at 11:48

GoogleCodeExporter commented 9 years ago
Exactly right.

Original comment by csilv...@gmail.com on 22 May 2010 at 12:15

GoogleCodeExporter commented 9 years ago
I doubt this helps at all:

(gdb) run
Starting program: 
/Users/tycho/Development/google-perftools-read-only/.libs/heap-profiler_unittest

Reading symbols for shared libraries +++. done
Starting tracking the heap

Breakpoint 1, MallocHook_SetNewHook (hook=0x1000d38ab <NewHook(void const*, 
unsigned long)>) at 
src/malloc_hook.cc:156
156   return new_hook_.Exchange(hook);
(gdb) bt
#0  MallocHook_SetNewHook (hook=0x1000d38ab <NewHook(void const*, unsigned 
long)>) at 
src/malloc_hook.cc:156
#1  0x00000001000c9f62 in HeapProfilerStart (prefix=0x100584018 
"/var/folders/uk/ukYFQiAKHcWpuUFyY0w3kU+++TI/-Tmp-//start_stop") at 
src/heap-profiler.cc:511
#2  0x00000001000016ba in std::string::_M_rep () at 
/usr/include/c++/4.2.1/bits/basic_string.h:88
#3  0x00000001000016ba in ~basic_string [inlined] () at 
/usr/include/c++/4.2.1/bits/basic_string.h:493
#4  ~basic_string [inlined] () at /usr/include/c++/4.2.1/bits/basic_string.h:88
#5  TestHeapProfilerStartStopIsRunning [inlined] () at 
/Users/tycho/Development/google-perftools-read-
only/src/tests/heap-profiler_unittest.cc:493
#6  0x00000001000016ba in main (argc=<value temporarily unavailable, due to 
optimizations>, 
argv=0x7fff5fbfde20) at src/tests/heap-profiler_unittest.cc:130
(gdb) cont
Continuing.
Had other new/delete MallocHook-s set. Are you using the heap leak checker? Use 
--heap_check="" to avoid 
this conflict.

Program received signal SIGABRT, Aborted.
0x00007fff80b9b886 in __kill ()
(gdb) bt
#0  0x00007fff80b9b886 in __kill ()
#1  0x00007fff80c3beae in abort ()
#2  0x00000001000c9a04 in LogPrintf [inlined] () at 
/Users/tycho/Development/google-perftools-read-
only/src/base/logging.h:198
#3  0x00000001000c9a04 in RAW_LOG (lvl=-4, pat=<value temporarily unavailable, 
due to optimizations>) 
at logging.h:217
#4  0x00000001000c9f7d in MallocHook::SetDeleteHook () at 
/Users/tycho/Development/google-perftools-
read-only/src/google/malloc_hook.h:514
#5  0x00000001000c9f7d in HeapProfilerStart (prefix=0x100584018 
"/var/folders/uk/ukYFQiAKHcWpuUFyY0w3kU+++TI/-Tmp-//start_stop") at 
src/heap-profiler.cc:516
#6  0x00000001000016ba in std::string::_M_rep () at 
/usr/include/c++/4.2.1/bits/basic_string.h:88
#7  0x00000001000016ba in ~basic_string [inlined] () at 
/usr/include/c++/4.2.1/bits/basic_string.h:493
#8  ~basic_string [inlined] () at /usr/include/c++/4.2.1/bits/basic_string.h:88
#9  TestHeapProfilerStartStopIsRunning [inlined] () at 
/Users/tycho/Development/google-perftools-read-
only/src/tests/heap-profiler_unittest.cc:493
#10 0x00000001000016ba in main (argc=<value temporarily unavailable, due to 
optimizations>, 
argv=0x7fff5fbfde20) at src/tests/heap-profiler_unittest.cc:130
(gdb)

So I modified the source to find out which hook was failing (instead of if 
(setnewhook || setdeletehook), I split 
it into two if statements and printed clearly different messages). The 
MallocHook_SetNewHook is definitely the 
one that triggered the fatal error.

I notice this:

AtomicPtr<MallocHook::NewHook>    new_hook_ = {
  reinterpret_cast<AtomicWord>(InitialMallocHook_New) };
AtomicPtr<MallocHook::DeleteHook> delete_hook_ = { 0 };

I'm not entirely certain why new_hook_ has this InitialMallocHook_New. It seems 
odd, since 
InitialMallocHook_New uninstalls itself on the first call, so why wouldn't 
new_hook_ be NULL to start? I tried 
initializing new_hook_ to '0', and I get this:

(gdb) run
Starting program: 
/Users/tycho/Development/google-perftools-read-only/.libs/heap-profiler_unittest

Reading symbols for shared libraries +++. done
Starting tracking the heap
Starting tracking the heap
DONE.

Program exited normally.
(gdb)

Have I simply side-stepped the issue, or was this the actual bug? It seems far 
too simple.

Original comment by neunon on 22 May 2010 at 1:05

GoogleCodeExporter commented 9 years ago
Oh, and changing new_hook_ to be initialised to '0' gives me:

===================
All 38 tests passed
===================

Original comment by neunon on 22 May 2010 at 1:13

GoogleCodeExporter commented 9 years ago
} AtomicPtr<MallocHook::NewHook>    new_hook_ = {
}   reinterpret_cast<AtomicWord>(InitialMallocHook_New) };

Aha!  It's a problem with weak symbols.  Maybe.  That makes sense --
it's the kind of thing one wouldn't be surprised to break from one OS
(or libc) version to another.

InitialMallocHook_New *is* a kind of trivial function in most cases,
but not when linking with heap-checker.cc.  This is kind of magical
behavior, which is implemented via this line in malloc_hook.cc:

   ATTRIBUTE_WEAK
   extern void InitialMallocHook_New(const void* ptr, size_t size);

The comments above this line explain a bit about what's going on.

Now the question is why ATTRIBUTE_WEAK is doing something wonky for
you.  First, let's confirm that it is.  If you can run in gdb, the
easiest would be to do something like 'b InitialMallocHook_New', and
see what function the breakpoint is put in.  That will tell us what
version the linker chose.

If that's not practical, you can put printf statements in
InitialMallocHook_New in both malloc_hook.cc and heap-checker.cc, and
see which printf is actually getting printed.

} The MallocHook_SetNewHook is definitely the one that triggered the
} fatal error.

Actually, your backtrace is showing it's from SetDeleteHook:

} #3  0x00000001000c9a04 in RAW_LOG (lvl=-4, pat=<value temporarily
} unavailable, due to optimizations>)
} at logging.h:217
} #4  0x00000001000c9f7d in MallocHook::SetDeleteHook () at
} /Users/tycho/Development/google-perftools-
} read-only/src/google/malloc_hook.h:514

Am I misreading this?  (I must be, if changing new_hook_ initialization fixes 
your 
problem.)  In any case, best to repeat the above (gdb or
printf) with InitialMallocHook_Delete too.  Let me know what it shows
up.

Original comment by csilv...@gmail.com on 24 May 2010 at 3:51

GoogleCodeExporter commented 9 years ago
Well, this is, uh. Interesting:

(gdb) b InitialMallocHook_New
Breakpoint 1 at 0x20c49ba5e24515: file atomicops.h, line 212.
(gdb) run
Starting program: 
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest 
Reading symbols for shared libraries +++. done

Breakpoint 1, InitialMallocHook_New (ptr=0x100580000, size=1) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb)

It set the breakpoint in atomicops.h above, but then actually tripped the 
breakpoint in malloc_hook.cc:218.

It also looks like it never un-sets itself:

(gdb) cont
Continuing.

Breakpoint 1, InitialMallocHook_New (ptr=0x100580008, size=1) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.

Breakpoint 1, InitialMallocHook_New (ptr=0x100581000, size=16) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.

Breakpoint 1, InitialMallocHook_New (ptr=0x100580008, size=8) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.

Breakpoint 1, InitialMallocHook_New (ptr=0x100582000, size=30) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.

Breakpoint 1, InitialMallocHook_New (ptr=0x100580000, size=8) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.

Breakpoint 1, InitialMallocHook_New (ptr=0x100583000, size=75) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.

Breakpoint 1, InitialMallocHook_New (ptr=0x100584000, size=125) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.
Starting tracking the heap
Had other new/delete MallocHook-s set. Are you using the heap leak checker? Use 
--heap_check="" to avoid 
this conflict.

Program received signal SIGABRT, Aborted.
0x00007fff80b9b886 in __kill ()
(gdb) bt  
#0  0x00007fff80b9b886 in __kill ()
#1  0x00007fff80c3beae in abort ()
#2  0x00000001000ca698 in RAW_LOG (lvl=-4, pat=0x6 <Address 0x6 out of bounds>) 
at logging.h:198
#3  0x00000001000caaf9 in HeapProfilerStart (prefix=0x100584018 
"/var/folders/uk/ukYFQiAKHcWpuUFyY0w3kU+++TI/-Tmp-//start_stop") at 
src/heap-profiler.cc:515
#4  0x0000000100001710 in TestHeapProfilerStartStopIsRunning [inlined] () at 
/Users/tycho/Development/google-perftools/src/tests/heap-profiler_unittest.cc:88
#5  0x0000000100001710 in main (argc=<value temporarily unavailable, due to 
optimizations>, 
argv=0x7fff5fbfe040) at src/tests/heap-profiler_unittest.cc:130
(gdb)

Original comment by neunon on 24 May 2010 at 4:29

GoogleCodeExporter commented 9 years ago
} It set the breakpoint in atomicops.h above, but then actually tripped the 
breakpoint
} in malloc_hook.cc:218.

I wouldn't worry too much about that -- that's just optimizer goodness, I'm 
guessing.

} It also looks like it never un-sets itself:

Instead of just continuing at the breakpoint, can you hit something like the 
following
   (gdb) bt
   (gdb) s
   (gdb) finish
   [should print the return value from MallocHook::GetNewHook()]
   (gdb) print InitialMallocHook_New
   (gdb) n
   (gdb) n

This should give us insight into what the code is doing.  My guess is that 
GetNewHook 
is set to some other value rather than InitialMallocHook_New, but is maybe 
calling 
this InitialMallocHook_New anyway?

Original comment by csilv...@gmail.com on 24 May 2010 at 7:12

GoogleCodeExporter commented 9 years ago
I executed some of your commands rather blindly. I'm somewhat of a gdb flunky 
(I was spoiled by Visual C++ 
for years):

(gdb) bt
#0  InitialMallocHook_New (ptr=0x100580000, size=1) at src/malloc_hook.cc:218
#1  0x00000001000dbfa3 in tc_malloc (size=1) at malloc_hook-inl.h:98
#2  0x00000001000c92ad in TCMallocGuard::TCMallocGuard (this=<value temporarily 
unavailable, due to 
optimizations>) at src/tcmalloc.cc:766
#3  0x00000001000c9305 in TCMallocGuard::TCMallocGuard (this=<value temporarily 
unavailable, due to 
optimizations>) at src/tcmalloc.cc:781
#4  0x00000001000c936a in __static_initialization_and_destruction_0 
(__initialize_p=<value temporarily 
unavailable, due to optimizations>, __priority=<value temporarily unavailable, 
due to optimizations>) at 
src/tcmalloc.cc:794
#5  0x00000001000c93b6 in global constructors keyed to 
_ZN61FLAG__namespace_do_not_use_directly_use_DECLARE_int64_instead43FLAGS_tcmall
oc_large_alloc_repor
t_thresholdE () at src/tcmalloc.cc:1486
#6  0x00007fff5fc0d500 in 
__dyld__ZN16ImageLoaderMachO18doModInitFunctionsERKN11ImageLoader11LinkContextE 
()
#7  0x00007fff5fc0bcec in 
__dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#8  0x00007fff5fc0bc9d in 
__dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#9  0x00007fff5fc0bda6 in 
__dyld__ZN11ImageLoader15runInitializersERKNS_11LinkContextE ()
#10 0x00007fff5fc0210e in __dyld__ZN4dyld24initializeMainExecutableEv ()
#11 0x00007fff5fc06981 in __dyld__ZN4dyld5_mainEPK12macho_headermiPPKcS5_S5_ ()
#12 0x00007fff5fc016d2 in 
__dyld__ZN13dyldbootstrap5startEPK12macho_headeriPPKcl ()
#13 0x00007fff5fc01052 in __dyld__dyld_start ()
(gdb) s
MallocHook::GetNewHook () at 
/Users/tycho/Development/google-perftools/src/malloc_hook-inl.h:212
212       reinterpret_cast<volatile const AtomicWordCastType*>(ptr));
(gdb) finish

Breakpoint 1, InitialMallocHook_New (ptr=0x100580008, size=1) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
Value returned is $1 = (void (*)(const void *, size_t)) 0x7fffffe00180 
<__memory_barrier>
(gdb) print InitialMallocHook_New
$2 = {void (const void *, size_t)} 0x1000d550c <InitialMallocHook_New(void 
const*, unsigned long)>
(gdb) n
InitialMallocHook_New (ptr=<value temporarily unavailable, due to 
optimizations>, size=<value temporarily 
unavailable, due to optimizations>) at src/malloc_hook.cc:220
220 }
(gdb) n
tc_malloc (size=1) at src/tcmalloc.cc:1303
1303    }
(gdb)

Original comment by neunon on 24 May 2010 at 7:20

GoogleCodeExporter commented 9 years ago
Thanks, that's helpful.  The 'finish' output is a bit suspicious: it looks like 
it's 
returning a pointer to _memory_barrier, not to InitialMallocHook_New.  I don't 
know 
if that's an optimization artifact or not, though.  I'd guess not, based on the 
fact 
the == comparison is failing: code skips from line 218 to 220, never executing 
line 
219.  So the prime theory now is that MallocHook::GetNewHook() is giving weird 
(incorrect) results.

When you're in gdb and are at this breakpoint, can you print the value of 
MallocHook::new_hook_?  Also, try running
   (gdb) print MallocHook::GetNewHook()
and if that succeeds, see if the two values match.  And might as well print 
InitialMallocHook_New, while you're at it.

btw, you're on an intel system, right?  I think all recent OS X releases are 
intel-
only (i386 or x86_64).  That may make a difference, if you're not.

Here's another thing you might try: put a breakpoint at __dyld__dyld_start.  In 
that 
breakpoint, print the value of MallocHook::new_hook_ (if it lets you).  Then 
you can 
continue to your next breakpoint (the one you have now) and see if 
MallocHook::new_hook_ has the same value.

Original comment by csilv...@gmail.com on 24 May 2010 at 8:06

GoogleCodeExporter commented 9 years ago
I'm running an Intel mac. It's one of the early 2007 ones, though, so it is 
running a 32-bit kernel and 64-bit 
userspace. Doubt that affects anything though.

And I seem to be spinning my wheels as far as gdb is concerned:

(gdb) inspect InitialMallocHook_New
$3 = {void (const void *, size_t)} 0x1000d550c <InitialMallocHook_New(void 
const*, unsigned long)>
(gdb) inspect new_hook_
No symbol "new_hook_" in current context.
(gdb) inspect MallocHook::new_hook_
There is no field named new_hook_
(gdb) inspect MallocHook::GetNewHook()

Breakpoint 1, InitialMallocHook_New (ptr=0x1000d550c, size=1) at 
src/malloc_hook.cc:218
218    if (MallocHook::GetNewHook() == &InitialMallocHook_New)
The program being debugged stopped while in a function called from GDB.
When the function (Acquire_Load) is done executing, GDB will silently
stop (instead of continuing to evaluate the expression containing
the function call).
(gdb) cont
Continuing.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5fbfbb50
0x00007fff5fbfbb50 in ?? ()
(gdb) bt
#0  0x00007fff5fbfbb50 in ?? ()
#1  0x758d48087d8b48f0 in ?? ()

Original comment by neunon on 25 May 2010 at 12:03

GoogleCodeExporter commented 9 years ago
Sorry this is a bit of work to track down.  But I'm confident we'll get to the 
bottom 
of it.

} inspect MallocHook::new_hook_

Sorry, I misread how this is declared.  It's base::internal::new_hook_, not 
MallocHook::new_hook_.

} inspect MallocHook::GetNewHook()

Try disabling the breakpoint ((gdb) disable 1) before running this command.

Original comment by csilv...@gmail.com on 25 May 2010 at 4:07