Closed GoogleCodeExporter closed 9 years ago
It looks like the deadlock is because of the assertion failure -- the program
is
crashing in the middle of a dump, and at exit it tries to dump again, so no
surprise
there's deadlock. The question is why is the check failing that regions_ !=
NULL.
It's true that HEAP_PROFILE_MMAP is not the most well-tested part of perftools.
(We
don't use it at all within Google.) I suspect, though, that that's not what is
causing you to not see any memory in the heap-profile dump. Unless your
application
never uses malloc or new, and only uses mmap to allocate memory?
So while this is a likely a bug somewherewith HEAP_PROFILE_MMAP, the more
useful bug
to you, I expect, is why you're not seeing any other memory in the dump. Do
you see
anything in the dump -- hex addresses maybe?
If you can run without HEAP_PROFILE_MMAP, and attach the resulting profile
file, I'll
take a look. If it's possible for you to attach a minimal program that
illustrates
this problem, that would be great as well.
Original comment by csilv...@gmail.com
on 8 Jan 2010 at 5:43
Ok, I have not immediate nor anticipated need for HEAP_PROFILE_MMAP, this was
simply trying to find a workaround :)
The app is strictly using system malloc, sometimes via wrappers, but that
shouldn't matter.
I've just tried to reproduce the missing memory with a simplistic C program,
but of course there it works as expected. :(
I'll try again with a shared library, as that's how most of our code is loaded.
Below is the pprof output of the attached file.
It was created by starting mysql-proxy as originally described, but without
HEAP_PROFILE_MMAP, then using the mysql client app to
connect, execute 'SELECT 1', disconnect. This most certainly involved several
malloc/free calls.
This test is still single threaded.
The dump was also created by sending USR1 to the process (all that does is to
set a flag that causes HeapProfilerDump() to be
executed the next time through the runloop).
There is indeed a crash at program exit, which only happens with tcmalloc's
free, not system free. I'll investigate, but it is unrelated to
this problem (might well be a genuine bug in our code).
classdump:mysql-proxy kroepke$ ~/local/tcmalloc/bin/pprof --text
libexec/mysql-proxy empty.hprof.0001.heap
Total: 0.0 MB
0.0 81.8% 81.8% 0.0 81.8% std::basic_string::_Rep::_S_create
0.0 18.2% 100.0% 0.0 100.0% __static_initialization_and_destruction_0
0.0 0.0% 100.0% 0.0 100.0% 0x000000008fe01037
0.0 0.0% 100.0% 0.0 100.0% 0x000000008fe01872
0.0 0.0% 100.0% 0.0 100.0% 0x000000008fe04102
0.0 0.0% 100.0% 0.0 100.0% 0x000000008fe07bcf
0.0 0.0% 100.0% 0.0 100.0% 0x000000008fe0e775
0.0 0.0% 100.0% 0.0 100.0% 0x000000008fe0e7e3
0.0 0.0% 100.0% 0.0 100.0% 0x000000008fe0e8c9
0.0 0.0% 100.0% 0.0 100.0% 0x000000008fe12f36
0.0 0.0% 100.0% 0.0 100.0% __mh_execute_header
0.0 0.0% 100.0% 0.0 81.8% std::basic_string::_S_construct
0.0 0.0% 100.0% 0.0 81.8% std::basic_string::basic_string
Original comment by kroe...@gmail.com
on 8 Jan 2010 at 6:06
Attachments:
Note that the heap profiler only shows memory *currently allocated* at the time
of the
dump. So if SELECT 1 allocated memory and then freed it all, that wouldn't
show up in
the heap profile.
} I'll try again with a shared library, as that's how most of our code is
loaded.
One issue is that tcmalloc has to be linked in last; stuff linked in after it
won't be
seen by tcmalloc. Is that possibly at issue here?
Original comment by csilv...@gmail.com
on 8 Jan 2010 at 6:15
I should've phrased that differently, yes: There is be a memory trace after the
test, so I should've been able to see it.
Since we dlopen plugins that do most of the allocation, I've recreated that
test case with a single function that allocates memory, see the attached files.
That memory is all hidden behind __mh_execute_header, which I guess means
tcmalloc won't help me.
I'll see if changing the linking order helps, though I suspect the dlopen makes
that point moot, unless I find a way to do it with some dlyd preloading and
dlsym magic for finding the relevant function pointers. Although that's
extremely likely to be a pain.
What's interesting is that in the mysql-proxy case, the __mh_execute_header
portion is indeed 0.0 MB where it has to be way more than that. I can see
nothing fundamentally different to what the test case does (in terms of linking
and dlopen order in time).
For completeness' sake, the compiler/linker steps:
$ gcc -fPIC -O0 -g -I/Users/kroepke/local/tcmalloc/include -c leak.c
$ gcc -dynamiclib -Wl,-undefined -Wl,dynamic_lookup -o leak.bundle -lc leak.o
$ gcc -o ltc-dyn leak-main.c -O0 -ggdb -Wl,-undefined,dynamic_lookup \
-ltcmalloc -I/Users/kroepke/local/tcmalloc/include -L/Users/kroepke/local/tcmalloc/lib -I.
$ HEAP_PROFILE=dyn ./ltc-dyn
leads to:
classdump:tmp kroepke$ ~/local/tcmalloc/bin/pprof --text ltc dyn.0002.heap
Total: 3.4 MB
3.4 100.0% 100.0% 3.4 100.0% _main
0.0 0.0% 100.0% 0.0 0.0% std::basic_string::_Rep::_S_create
0.0 0.0% 100.0% 0.0 0.0% __static_initialization_and_destruction_0
0.0 0.0% 100.0% 0.0 0.0% 0x000000008fe01037
0.0 0.0% 100.0% 0.0 0.0% 0x000000008fe01872
0.0 0.0% 100.0% 0.0 0.0% 0x000000008fe04102
0.0 0.0% 100.0% 0.0 0.0% 0x000000008fe07bcf
0.0 0.0% 100.0% 0.0 0.0% 0x000000008fe0e775
0.0 0.0% 100.0% 0.0 0.0% 0x000000008fe0e7e3
0.0 0.0% 100.0% 0.0 0.0% 0x000000008fe0e8c9
0.0 0.0% 100.0% 0.0 0.0% 0x000000008fe12f36
0.0 0.0% 100.0% 3.4 100.0% __mh_execute_header
0.0 0.0% 100.0% 3.4 100.0% dyld_stub_binding_helper
0.0 0.0% 100.0% 0.0 0.0% std::basic_string::_S_construct
0.0 0.0% 100.0% 0.0 0.0% std::basic_string::basic_string
Original comment by kroe...@gmail.com
on 8 Jan 2010 at 7:08
Attachments:
Yeah, I'm afraid I don't know much about what happens with perftools in the
face of
dlopen. It's also something that likely differs significantly from OS to OS.
If you
want to poke around, that would be great! You could maybe take your test
programs
and add MallocHooks for 'new' that print some information out, to see what
tcmalloc
is seeing.
If tcmalloc is seeing all these allocations, then the next step is to look at
the
heap-profiler output (the raw file) -- it should be pretty easy to understand.
This
will show if the problem is with GetStackTrace -- it's not going beyond the
dlopen
boundary, perhaps -- or with pprof, or with something else.
Original comment by csilv...@gmail.com
on 8 Jan 2010 at 7:20
Will do, thanks for your help so far!
Original comment by kroe...@gmail.com
on 8 Jan 2010 at 7:25
Any more word on this? I'm sorry I can't help much remotely, but I'm
interested to
hear anything you've found out so far.
Original comment by csilv...@gmail.com
on 10 Mar 2010 at 6:55
Hi, I am facing similar problem like this, can anybody help? I am linking the
tcmalloc static library with my project and generating the dynamic library. The
dynamic library calls static function HeapLeakChecker::NoGlobalLeaks. I use
java.exe to load/execute my dynamic library and the environment variable
HEAPCHECK is set to normal. When the program executes the function
NoGlobalLeaks (before calling this I allocated memory using tc_malloc and it is
not freed until this point), it hangs after displaying the message “Check
failed: regions_ != NULL:”. I just did the same thing with sample exe
application but it works fine.
Original comment by shivang...@gmail.com
on 9 Sep 2010 at 9:10
Sounds like it may be similar -- weird things happening due to dynamic loading.
Or it may be totally different. You may have to go through the code and
figure out where regions_ is being set to a non-NULL value, and figure out why
that's not happening for you.
Original comment by csilv...@gmail.com
on 13 Sep 2010 at 8:00
It's been almost a year now with no word, so I'm closing this CannotReproduce.
If someone can come up with a simple test-case that illustrates this, or debug
it themselves, I'm happy to look into it again.
Original comment by csilv...@gmail.com
on 31 Aug 2011 at 10:23
Original issue reported on code.google.com by
kroe...@gmail.com
on 8 Jan 2010 at 4:52