casseopea2 / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Deadlock when heap check enabled; using python scripts to call C functions #88

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have a test suite, which tests a C code with Python scripts. Hence Swig 
is used to interface the C and Python. I was trying to use google heap 
leak checker to identify the leaks in the C code. The C code has been 
compiled with tcmalloc library. 
When running in minimal, normal and strict modes, I seem to run in a 
deadlock as described below. Though if I execute the suite in draconian 
mode, it runs normally and gives some results. The results thus obtained, 
are not what I expected. Because I have leaks in the C code being 
executed, but they are not reported at all. Instead some internal leaks in 
the heap checker are being reported(which is usual when using the 
draconian mode).

What steps will reproduce the problem?
1. Run the python script
env HEAPCHECK=normal python py_script.py

What is the expected output? 

What do you see instead?
Check failed: regions_ != NULL:

What version of the product are you using? 
root/google-perftools-1.0rc2

On what operating system?

Fedora 7 Linux
Kernel: 2.6.23.17-88.fc7 
Architecture : x86_64

Please provide any additional information below.

The below trace was obtained by running pstack on python

#0  0x0000003079e0d480 in __nanosleep_nocancel () 
from /lib64/libpthread.so.0
#1  0x00002aaaaec6491f in SpinLock::SlowLock ()
#2  0x00002aaaaec5c424 in DeleteHook () 
from /usr/local/lib/libtcmalloc.so.0
#3  0x00002aaaaec671ac in operator delete ()
#4  0x00002aaaaec5f6b1 in __tcf_2 () from /usr/local/lib/libtcmalloc.so.0
#5  0x0000003079233449 in exit () from /lib64/libc.so.6
#6  0x00002aaaaec56856 in MemoryRegionMap::BeginRegionLocked ()
#7  0x00002aaaaec5c92c in RegisterStack () 
from /usr/local/lib/libtcmalloc.so.0
#8  0x00002aaaaec5d7b6 in 
HeapLeakChecker::IgnoreNonThreadLiveObjectsLocked ()
#9  0x00002aaaaec5e071 in HeapLeakChecker::IgnoreLiveThreads ()
#10 0x00002aaaaec62cf7 in ListerThread () 
from /usr/local/lib/libtcmalloc.so.0
#11 0x00002aaaaec6228a in local_clone () 
from /usr/local/lib/libtcmalloc.so.0
#12 0x0000000000000000 in ?? ()

Original issue reported on code.google.com by Aaditya....@gmail.com on 16 Dec 2008 at 6:11

GoogleCodeExporter commented 9 years ago
It looks like the deadlock is caused by the underlying problem, which is the
assertion failure.  The stacktrace shows that BeginRegionLocked is calling exit,
which is because of the failed check(regions_ != NULL).  So the question is, 
why is
that happening?  I'll see if I can reproduce it locally.

Does this happen for every python script you run, or just some?  Do you still 
see the
check failure if you run something like
   env HEAPCHECK=normal python -c pass
?

Original comment by csilv...@gmail.com on 21 Dec 2008 at 2:08

GoogleCodeExporter commented 9 years ago
Sorry for a late reply.
The scenario discussed is reproducible by all the scripts I have. But as these 
are a 
little different from the usual scripts(C code is invoked), I can not comment 
about 
the usual python scripts.

I executed "env HEAPCHECK=normal python -c pass", it went fine. There was no 
output 
or any visible problem.

Original comment by Aaditya....@gmail.com on 30 Dec 2008 at 1:52

GoogleCodeExporter commented 9 years ago
Hmm, I'm not sure I can solve this without being able to reproduce it.  Is 
there an
example of a python script + C file you can share, that triggers the problem 
for you?
 I'll try compiling it all here, to see if I can reproduce it locally.

Original comment by csilv...@gmail.com on 7 Jan 2009 at 10:33

GoogleCodeExporter commented 9 years ago
I'm closing this bug CannotReproduce.  If you can provide any source code that
triggers the problem for you, I'll be glad to look into it.

Original comment by csilv...@gmail.com on 6 Mar 2009 at 8:13

GoogleCodeExporter commented 9 years ago
Here's an example that produces the problem for me. This uses boost.python, 
however the symptoms are the same as the above.

`make` to build; `make run` to run.

Original comment by dtcac...@gmail.com on 26 Jan 2011 at 12:29

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for the reproducible test case; that's been very helpful.

This is definitely different from the way heap-checker is intended to be used, 
where it's started at program-begin time (rather than at a dlopen time, like 
happens here).  I don't know if that's the cause of the trouble, though.

The fact that regions_ is NULL means that the tcmalloc hooks were never called 
for mmap or sbrk.  This indicates to me that python (maybe?) is controlling the 
memory access, and that the heap-checker wouldn't show anything interesting 
anyway.  Of course, this may just be because your program doesn't allocate any 
memory.  (Though I tried adding a random malloc() to the .cpp file, and it 
didn't seem to make a difference.)

I tried running things so that python was under tcmalloc as well:
   env LD_PRELOAD=/home/csilvers/opensource/google-perftools/.libs/libtcmalloc.so HEAPCHECK=strict python

That seemed to work fine.  So even though I don't normally recommend using 
LD_PRELOAD, I think it may be the only way to go here (unless you want to 
recompile python so it links in tcmalloc).

I'm closing this WontFix, because I think there's an intrinsic problem with 
trying to start the heap-checker from a dynamic library, and there seems to be 
a workable workaround.  If the workaround doesn't work for you, feel free to 
reopen, but I can't promise how much we'll be able to fix things.  (We can 
probably avoid the assertion failure, but I'm not sure we can replace it by 
something useful...)

Original comment by csilv...@gmail.com on 26 Jan 2011 at 12:57