cynthia / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

SIGSEGV received when using gperftool 1.10 built with cygwin on x64 #400

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Compile tcmalloc_minimal for release x64 target in cygwin/mingw
./conffigure --host=x86_64-w64-mingw32 && make
2. Link application against libtcmalloc_minimal.
3. Run application and receive the following error message:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 904.0x18f0]
SLL_PopRange (this=0x23a0000, src=0x23a0280, cl=37, N=1)
    at src/linked_list.h:80
80      src/linked_list.h: No such file or directory.
        in src/linked_list.h
Current language:  auto
The current source language is "auto; currently c++".
(gdb) bt
#0  SLL_PopRange (this=0x23a0000, src=0x23a0280, cl=37, N=1)
    at src/linked_list.h:80
#1  tcmalloc::ThreadCache::FreeList::PopRange (this=0x23a0000, src=0x23a0280,
    cl=37, N=1) at src/thread_cache.h:217
#2  tcmalloc::ThreadCache::ReleaseToCentralCache (this=0x23a0000,
    src=0x23a0280, cl=37, N=1) at src/thread_cache.cc:231
#3  0x00000000711da3d2 in tcmalloc::ThreadCache::Scavenge (this=0x23a0000)
    at src/thread_cache.cc:250
#4  0x00000000711c2494 in tcmalloc::ThreadCache::Deallocate (ptr=0x2583800,
    invalid_free_fn=0x7fefe840200) at ./src/thread_cache.h:368
#5  do_free_with_callback (ptr=0x2583800, invalid_free_fn=0x7fefe840200)
    at ./src/tcmalloc.cc:1123
#6  0x00000000711c4656 in Perftools_free (ptr=0x2583800)
    at src/windows/patch_functions.cc:805
GDB does not support pointers to methods on this target
(gdb)

What version of the product are you using? On what operating system?
1.10 on Windows 7 using cygwin/mingw

Please provide any additional information below.

Original issue reported on code.google.com by bin....@gmail.com on 1 Feb 2012 at 1:58

GoogleCodeExporter commented 9 years ago
In our experience, this almost always means a memory error in the application 
(double-free or the like).  Such code has undefined behavior, and may work with 
some malloc implementations and crash with others.

Sadly, I don't know of a good way to debug memory problems on windows, but that 
would be where I look first.

Original comment by csilv...@gmail.com on 1 Feb 2012 at 2:08

GoogleCodeExporter commented 9 years ago
It (application + tcmalloc) works fine when running on ubuntu and centos. 

I modified the following code snippet and it runs longer. I suspect it is 
related to the initialization sequence among modules. Maybe the global pageheap 
hasn't been initialized.

inline void do_free_with_callback(void* ptr, void (*invalid_free_fn)(void*)) {
  if (ptr == NULL) return;
  if (Static::pageheap() == NULL) {
    // We called free() before malloc().  This can occur if the
    // (system) malloc() is called before tcmalloc is loaded, and then
    // free() is called after tcmalloc is loaded (and tc_free has
    // replaced free), but before the global constructor has run that
    // sets up the tcmalloc data structures.
    //(*invalid_free_fn)(ptr);  // Comment out and return directly
    return;
  }

After a while, i get the following exception. I use gdb to show the calling 
stacks.

(gdb) bt
#0  0x00000000711c56e2 in SLL_Pop (size=<value optimized out>)
    at ./src/linked_list.h:58
#1  tcmalloc::ThreadCache::FreeList::Pop (size=<value optimized out>)
    at ./src/thread_cache.h:204
#2  tcmalloc::ThreadCache::Allocate (size=<value optimized out>)
    at ./src/thread_cache.h:344
#3  do_malloc (size=<value optimized out>) at ./src/tcmalloc.cc:1068
#4  0x00000000711c7204 in do_malloc_or_cpp_alloc (size=28)
    at ./src/tcmalloc.cc:1005
#5  Perftools_malloc (size=28) at src/windows/patch_functions.cc:797
warning: (Internal error: pc 0x6fce5d09 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x6fce5d09 in read in psymtab, but not in symtab.)

#6  0x000000006fce5d0a in libstdc++-6!_Znwy (warning: (Internal error: pc 0x6fce
5d09 in read in psymtab, but not in symtab.)

) from c:\t1\bin\libstdc++-6.dll
#7  0x00000010ffffffff in ?? ()
#8  0x6573726100000000 in ?? ()
#9  0x0000000000000000 in ?? ()
(gdb) p list
$1 = <value optimized out>
(gdb) p *list
Cannot access memory at address 0x0

Original comment by bin....@gmail.com on 1 Feb 2012 at 9:29

GoogleCodeExporter commented 9 years ago
If you can run the application under linux, then definitely try using 
-ltcmalloc_debug on those platforms, if you can, and see if it shows up 
anything.  Or run under valgrind if you can.  I still think the most likely 
scenario here -- certainly not the only possible one, but the most likely -- is 
application-level error.

Original comment by csilv...@gmail.com on 1 Feb 2012 at 9:46

GoogleCodeExporter commented 9 years ago
I write a simple test program. 

OS: windows 7, 64bit, on cygwin/mingw
When I set the NUM_LOOP to 100, i never have any problem.
When I set the NUM_LOOP to 1000, i constantly get the following core dump:

Starting program: c:\devtest\google-perftools-1.10\google-perftools-1.10\tests/t
est.exe
[New Thread 6864.0xf88]
[New Thread 6864.0x1464]
[New Thread 6864.0x1fa4]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 6864.0x1fa4]
SLL_PopRange (this=0x3e0000, src=0x3e01f0, cl=28, N=5) at src/linked_list.h:80
80      src/linked_list.h: No such file or directory.
        in src/linked_list.h
Current language:  auto
The current source language is "auto; currently c++".
(gdb) directory ..
Source directories searched: c:\devtest\google-perftools-1.10\google-perftools-1
.10\tests/..;$cdir;$cwd
(gdb) bt
#0  SLL_PopRange (this=0x3e0000, src=0x3e01f0, cl=28, N=5)
    at src/linked_list.h:80
#1  tcmalloc::ThreadCache::FreeList::PopRange (this=0x3e0000, src=0x3e01f0,
    cl=28, N=5) at src/thread_cache.h:217
#2  tcmalloc::ThreadCache::ReleaseToCentralCache (this=0x3e0000,
    src=0x3e01f0, cl=28, N=5) at src/thread_cache.cc:231
#3  0x000000000041e994 in tcmalloc::ThreadCache::ListTooLong (
    this=<value optimized out>, list=0x3e01f0, cl=<value optimized out>)
    at src/thread_cache.cc:193
#4  0x0000000000407063 in tcmalloc::ThreadCache::Deallocate (ptr=0x808b00,
    invalid_free_fn=0x7fefe850200) at ./src/thread_cache.h:366
#5  do_free_with_callback (ptr=0x808b00, invalid_free_fn=0x7fefe850200)
    at ./src/tcmalloc.cc:1131
#6  0x0000000000408f46 in Perftools_free (ptr=0x808b00)
    at src/windows/patch_functions.cc:809
#7  0x0000000000401546 in worker (arg=0x7d4064) at test.c:13
warning: (Internal error: pc 0x62484dfd in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x62484dfd in read in psymtab, but not in symtab.)

#8  0x0000000062484dfe in pthread_timechange_handler_np (warning: (Internal erro
r: pc 0x62484dfd in read in psymtab, but not in symtab.)

)
   from c:\devtest\google-perftools-1.10\google-perftools-1.10\tests\pthreadGC2.
dll
warning: (Internal error: pc 0x62484dfd in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7d4063 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7d4063 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7d4063 in read in psymtab, but not in symtab.)

#9  0x00000000007d4064 in ?? (warning: (Internal error: pc 0x7d4063 in read in p
symtab, but not in symtab.)

)
warning: (Internal error: pc 0x7d4063 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x42f37b in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x42f37b in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x42f37b in read in psymtab, but not in symtab.)

#10 0x000000000042f37c in __DTOR_LIST__ (warning: (Internal error: pc 0x42f37b i
n read in psymtab, but not in symtab.)

)
warning: (Internal error: pc 0x42f37b in read in psymtab, but not in symtab.)

#11 0x0000000000000000 in ?? ()
(gdb)

Original comment by bin....@gmail.com on 2 Feb 2012 at 10:39

Attachments:

GoogleCodeExporter commented 9 years ago
Forget to mention that i have 2 out 16 testcase failure when running make check 
of build 1.10. They are malloc_hook_test and tcmalloc_minimal_unittest. Both of 
them are multithreaded programs. And error messages are

Testing threaded allocation/deallocation (10 threads)
Check failed: object.ptr[i] == expected

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

Original comment by bin....@gmail.com on 2 Feb 2012 at 10:51

GoogleCodeExporter commented 9 years ago
Ah, if you have failures when running 'make check', that puts a whole new look 
on things!  That should definitely not be happening.  I don't see these 
failures when I run perftools under mingw, but I guess there are different 
versions of mingw out there?  And of course setups that differ in other ways as 
well.

What do you mean when you say cygwin/mingw?  Is it a cygwin environment or a 
mingw environment?  It may be helpful for you to attach the output of your 
config.log file, and also what you get when you run 'cpp -dMM /dev/null'.

Original comment by csilv...@gmail.com on 2 Feb 2012 at 10:55

GoogleCodeExporter commented 9 years ago
Thanks for the super quick response. Please see the attached config.log and cpp 
output

Original comment by bin....@gmail.com on 2 Feb 2012 at 11:02

Attachments:

GoogleCodeExporter commented 9 years ago
And here are two minor changes that i made to get build working.

https://github.com/bcui6611/google-perftools-1.10/commit/ee7e31f4446eb00a9af2006
60f2d5de2540b6cce

Original comment by bin....@gmail.com on 2 Feb 2012 at 11:05

GoogleCodeExporter commented 9 years ago
Ah, this looks like a cygwin setup, not a mingw setup.

I've been seeing the same crashes in some of my cygwin tests.  I can't seem to 
debug it, sadly.  I assume it's some bad interaction google has with the cygwin 
runtime, but don't know what.

One thing you might do is follow the instructions in the INSTALL file, which 
mentions some flags to pass ./configure when compiling under cygwin.  I don't 
know if that will help in your case or not.  If not, you may need to dig into 
this with a debugger to see what's going wrong.  I wish I could help more!, but 
I'm stumped myself.

Original comment by csilv...@gmail.com on 2 Feb 2012 at 11:15

GoogleCodeExporter commented 9 years ago
As for the patches you had to make, I don't think they should be necessary -- I 
think the problem is that the configure-script is wrongly identifying your 
config as a mingw config when it's not (at least, there's no __MINGW__ #define 
in your compiler).  It is looking at the target triple, which has mingw in it.  
Why is that, btw? -- what is mingw-y about this setup?

Original comment by csilv...@gmail.com on 2 Feb 2012 at 11:19

GoogleCodeExporter commented 9 years ago
We leverage the gcc cross compilation feature. The hosting environment is 
cygwin. But we use the mingw toolchain to generate code for x64 and x86 
platforms. That's why we feed configure parameter  --host=x86_64-w64-mingw32 or 
--host=i686-w64-mingw32 

Original comment by bin....@gmail.com on 2 Feb 2012 at 11:37

GoogleCodeExporter commented 9 years ago
Ah, I see.  Cross-compiling for gperftools is not very well tested at the 
moment, I'm not surprised you're having some trouble.  But it's definitely 
strange you're seeing these crashes that, in my experience, are cygwin related, 
on code that's targeted to mingw.  Of course, I'm fuzzy on how cygwin and mingw 
work in any case, so maybe it makes sense to people who know more about it.

In any case, I probably won't be able to help you debug this that much.  As for 
your patches, they seem plausible to me but probably can't go in as-is.  For 
instance, I don't think my version of mingw defines MemoryBarrier.  So you'll 
either have to check the version number (like we already do for msvc) or else 
do some sort of configure-time check to figure this out.  But I'm not sure -- 
this might also be a cross-compiling issue.

Is it possible for you to build this natively on mingw?  If you do, do all 
tests pass?

Original comment by csilv...@gmail.com on 2 Feb 2012 at 11:43

GoogleCodeExporter commented 9 years ago
We found the root cause for the memory corruption. Afer we define the following 
statement in src/windows/mingw.h, exceptions won't be launched any more.

#undef HAVE_TLS

Looks like something wrong with support for TLS under cygwin/mingw for target 
host machine x86_64-w64-mingw32

Original comment by bin....@gmail.com on 10 Feb 2012 at 11:13

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Let's try this again :)

It sounds like this issue can now be closed off as an issue with 
x86_64-w64-mingw32 and not gperftools correct? Or do we need to look further 
into why HAVE_TLS is causing issues? I am curious to here some more details on 
how you discovered this was the issue.

Original comment by chapp...@gmail.com on 2 Mar 2012 at 5:02

GoogleCodeExporter commented 9 years ago
From various crashes, all pointed to thread related problems. 

We happened to test this flag by comparing test results generated from two 
build environment:
1. build from cygwin/mingw for target host x86_64-w64-mingw32
2. build from linux for target host x86_64-w64-mingw32. 

In the first case, we have HAVE_TLS=1 because we did think it is supported 
In the second case, we purposely set HAVE_TLS=0 because we really don't trust 
gcc for TLS. And it turns out this is the key.

Original comment by bin....@gmail.com on 2 Mar 2012 at 5:32

GoogleCodeExporter commented 9 years ago
Ok great. Is there a patch that you would like to recommend to make this the 
correct choice during configuration?

Original comment by chapp...@gmail.com on 21 Apr 2012 at 6:41

GoogleCodeExporter commented 9 years ago
Any chance we can just close this off as not worth addressing? 1.10 is getting 
quite old now. Unless this is an issue that still exists in 1.9/1.10/2.0 I am 
tempted to close it off.

Original comment by chapp...@gmail.com on 22 Dec 2012 at 8:17

GoogleCodeExporter commented 9 years ago
I believe that the TLS issue has been fixed by alkondratenko and a patch is 
coming in shortly (see issue-504).

Original comment by chapp...@gmail.com on 11 Mar 2013 at 2:15

GoogleCodeExporter commented 9 years ago

Original comment by chapp...@gmail.com on 11 Mar 2013 at 2:16

GoogleCodeExporter commented 9 years ago
I seem to have hit this or a similar issue using v2.1:

What steps will reproduce the problem?
1. Not easily reproducible.
2. Occurred on exit running static destructors.
3. Application got SEGV in gperftools code

What is the expected output? What do you see instead?

Got SEGV, should not happen.

What version of the product are you using? On what operating system?

Using v2.1 of gperftools on
Centos 4: Linux h3-centos4x32 2.6.9-103.ELsmp #1 SMP Fri Dec 9 04:31:51 EST 
2011 i686 i686 i386 GNU/Linux

Please provide any additional information below.

Here is the stack trace at the point of SEGV:

#0  0x081cd589 in GlobalSignalTermHandler (sig=11) at sysParam.cpp:1301
#1  <signal handler called>
#2  SLL_PopRange (this=0x97928d8, src=0x9792900, cl=1, N=20) at 
src/linked_list.h:44
#3  PopRange (this=0x97928d8, src=0x9792900, cl=1, N=20) at 
src/thread_cache.h:228
#4  tcmalloc::ThreadCache::ReleaseToCentralCache (this=0x97928d8, 
src=0x9792900, cl=1, N=20) at src/thread_cache.cc:229
#5  0x08470f30 in tcmalloc::ThreadCache::ListTooLong (this=0x97928d8, 
list=0x9792900, cl=1) at src/thread_cache.cc:191
#6  0x084665b3 in do_free_helper (ptr=0x9f16080, invalid_free_fn=0x8463090 
<(anonymous namespace)::InvalidFree(void*)>) at src/thread_cache.h:389
#7  (anonymous namespace)::do_free_with_callback (ptr=0x9f16080, 
invalid_free_fn=0x8463090 <(anonymous namespace)::InvalidFree(void*)>) at 
src/tcmalloc.cc:1210
#8  0x084866b1 in do_free (p=0x9f16080) at src/tcmalloc.cc:1219
#9  tc_delete (p=0x9f16080) at src/tcmalloc.cc:1619
#10 0x0811b0b7 in __gnu_cxx::new_allocator<std::basic_string<char, 
std::char_traits<char>, std::allocator<char> > >::deallocate (this=0x859e794, 
__p=0x9f16080) at /usr/include/c++/3.4.3/ext/new_allocator.h:86
#11 0x0811b0e1 in std::_Vector_base<std::basic_string<char, 
std::char_traits<char>, std::allocator<char> >, 
std::allocator<std::basic_string<char, std::char_traits<char>, 
std::allocator<char> > > >::_M_deallocate (this=0x859e794, __p=0x9f16080, 
__n=1) at /usr/include/c++/3.4.3/bits/stl_vector.h:117
#12 0x0811b11a in std::_Vector_base<std::basic_string<char, 
std::char_traits<char>, std::allocator<char> >, 
std::allocator<std::basic_string<char, std::char_traits<char>, 
std::allocator<char> > > >::~_Vector_base (this=0x859e794, __in_chrg=<value 
optimized out>) at /usr/include/c++/3.4.3/bits/stl_vector.h:106
#13 0x0811b157 in std::vector<std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > >::~vector (this=0x859e794, 
__in_chrg=<value optimized out>) at /usr/include/c++/3.4.3/bits/stl_vector.h:256
#14 0x081c32c2 in __tcf_37 () at sysParam.cpp:347
#15 0x00aca6b7 in exit () from /lib/tls/libc.so.6
#16 0x0820bc73 in Exit::exitNow (this=0x9f4e0b0, isFlumeTest=false) at 
Exit.cpp:117
#17 0x08111060 in main (argc=22, argv=0xbff2ef64, envp=0xbff2efc0) at 
main.cpp:132
#18 0x00ab4eb3 in __libc_start_main () from /lib/tls/libc.so.6
#19 0x081107a1 in _start ()

The object being destructed in frame 14 is a static member variable:

  vector<string> CSysParam::_excludeDirs(1, ".*");

I've been running this test hundreds of thousands of iterations, and I've only 
gotten this bug once.  My application is multi-threaded, having 3 threads, 
however, at the time of SEGV, only one thread, the main thread is still 
running.  I was unable to print variable 'tmp' because it was optimized out by 
the compiler, but I'm guessing that 'tmp' is NULL and in SLL_Next() it tries to 
dereference the NULL pointer and gets the SEGV.  I guess the linked list was 
somehow corrupted.  I was able to print 'i' and see its value was 13 for the 
loop.

Original comment by Willia...@gmail.com on 6 Sep 2013 at 8:34