cuitao2046 / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

SEGV in tcmalloc #582

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. I am using tcmalloc in my code
2. I am also using leveldb in my code
3. I see a SEGV in ptmalloc while leveldb invokes background compaction 
crashing my program

What is the expected output? What do you see instead?

I see a SEGV in ptmalloc instead of funcitoning program.

What version of the product are you using? On what operating system?

i am using gperftools version 2.1 on ubuntu 12.0.4

Please provide any additional information below.

Here is the stack trace from my optimized binaries at the time of the SEGV:

#0  SLL_Next (t=0x10715b74bec51b15) at src/linked_list.h:44
#1  SLL_Pop (list=<optimized out>) at src/linked_list.h:58
#2  Pop (this=0x18de2e0) at src/thread_cache.h:215
#3  Allocate (cl=1,
    size=<error reading variable: Cannot access memory at address 0x8>,
    this=<optimized out>) at src/thread_cache.h:367
#4  do_malloc_small (
    size=<error reading variable: Cannot access memory at address 0x8>,
    heap=<optimized out>) at src/tcmalloc.cc:1088
#5  do_malloc_no_errno (size=8) at src/tcmalloc.cc:1095
#6  cpp_alloc (nothrow=false, size=8) at src/tcmalloc.cc:1423
#7  tc_new (size=8) at src/tcmalloc.cc:1601
#8  0x00007f913e6e59ce in allocate (__n=<optimized out>, this=<optimized out>)
    at /usr/include/c++/4.6/ext/new_allocator.h:92
#9  _M_allocate (__n=<optimized out>, this=<optimized out>)
    at /usr/include/c++/4.6/bits/stl_vector.h:150
#10 std::vector<unsigned int, std::allocator<unsigned int> >::_M_insert_aux (
    this=0x25bb448, __position=..., __x=<optimized out>)
    at /usr/include/c++/4.6/bits/vector.tcc:327
#11 0x00007f913d76a9e7 in leveldb::BlockBuilder::Add(leveldb::Slice const&, 
leveldb::Slice const&) () from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#12 0x00007f913d76e8e1 in leveldb::TableBuilder::Add(leveldb::Slice const&, 
leveldb::Slice const&) () from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
---Type <return> to continue, or q <return> to quit---
#13 0x00007f913d7527f2 in 
leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from 
/home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#14 0x00007f913d752ff2 in leveldb::DBImpl::BackgroundCompaction() ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#15 0x00007f913d753acb in leveldb::DBImpl::BackgroundCall() ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#16 0x00007f913d77319f in leveldb::(anonymous 
namespace)::PosixEnv::BGThreadWrapper(void*) () from 
/home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#17 0x00007f913e921e9a in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007f913da7accd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#19 0x0000000000000000 in ?? ()

Any ideas?

thanks,
Sameer

Original issue reported on code.google.com by sameer.s...@gmail.com on 17 Oct 2013 at 7:23

GoogleCodeExporter commented 9 years ago
I reinstalled gperftools and recompiled debug binaries for my code. I hit the 
SEGV again. This time in src/central_freelist.cc:298 span->objects = 
*(reinterpret_cast<void**>(result)); Here is the call stack this time:

#0  tcmalloc::CentralFreeList::FetchFromSpans (this=0x7f9898f1caa0)
    at src/central_freelist.cc:298
#1  0x00007f9898cf1078 in tcmalloc::CentralFreeList::RemoveRange (
    this=0x7f9898f1caa0, start=0x7f982d066480, end=0x7f982d066488, N=166)
    at src/central_freelist.cc:269
#2  0x00007f9898cf4202 in tcmalloc::ThreadCache::FetchFromCentralCache (
    this=0x865e18, cl=<optimized out>, byte_size=8) at src/thread_cache.cc:160
#3  0x00007f9898d052d8 in Allocate (cl=<optimized out>, size=<optimized out>,
    this=<optimized out>) at src/thread_cache.h:364
#4  do_malloc_small (size=<optimized out>, heap=<optimized out>)
    at src/tcmalloc.cc:1088
#5  do_malloc_no_errno (size=5) at src/tcmalloc.cc:1095
#6  cpp_alloc (nothrow=false, size=5) at src/tcmalloc.cc:1423
#7  tc_newarray (size=5) at src/tcmalloc.cc:1631
#8  0x00007f98981e7574 in leveldb::Status::Status(leveldb::Status::Code, 
leveldb::Slice const&, leveldb::Slice const&) ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#9  0x00007f98981d80be in leveldb::Version::Get(leveldb::ReadOptions const&, 
leveldb::LookupKey const&, std::string*, leveldb::Version::GetStats*) ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#10 0x00007f98981bd98c in leveldb::DBImpl::Get(leveldb::ReadOptions const&, 
leveldb::Slice const&, std::string*) ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1

< valid stack from my code below>

I am using --enable-frame-pointers option while configuring since I do not want 
to install libunwind.

Original comment by sameer.s...@gmail.com on 17 Oct 2013 at 10:54

GoogleCodeExporter commented 9 years ago
Thanks for raising this.

First, I need to be sure, that this is not bug in your application. Have you 
tried running without tcmalloc but with either valgrind or address sanitizer ?

Alternatively, if you can attach test program or link to test program that 
would help too.

Original comment by alkondratenko on 17 Oct 2013 at 4:36

GoogleCodeExporter commented 9 years ago
Yes I have tried it with valgrind successfully without tcmalloc.

It is rather complex system to share in isolation. I will see how I can share 
the repro.

Is there any additional logging and/or profiling that I can do to help you 
identify the problem? It repros consistently in my environment.

Original comment by sameer.s...@gmail.com on 18 Oct 2013 at 7:04

GoogleCodeExporter commented 9 years ago
Ok. Can you share your compiler and toolchain?

Recently somebody reported very similarly looking issue on windows with intel 
compiler. I suspect we might be having pointer aliasing bug that more 
aggressive compiler is hitting.

In order to investigate that possibility:

* can you try with -O0 or something like that ?

* can you try with -fno-strict-aliasing (my understanding is both clang and icc 
support this flag on GNU/Linux) ?

* can try to post disassembly of FetchFromSpans ?

Original comment by alkondratenko on 18 Oct 2013 at 3:37

GoogleCodeExporter commented 9 years ago
I am using g++ version 4.6.3 on ubuntu.

Note that I was getting an error even with debug (-g) compilation.

We re factored and rewrote some code to accommodate some additional 
functionality in our system and now the issue does not seem to reproduce. There 
were some changes in areas around which the failures were seen however I 
haven't yet completely analyzed where and whether there were any issues in our 
code.

Original comment by sameer.s...@gmail.com on 20 Oct 2013 at 3:47

GoogleCodeExporter commented 9 years ago
While analyzing some valgrind errors I could isolate a small standalone leveldb 
program which gives errors similar to our code. I have posted to the issue 
here: http://code.google.com/p/leveldb/issues/detail?id=211 . I am posting the 
update here as well in case it is related.

Original comment by sameer.s...@gmail.com on 23 Oct 2013 at 6:47

GoogleCodeExporter commented 9 years ago
Closing then. Please reopen if you can reproduce it.

Original comment by alkondratenko on 27 Oct 2013 at 12:12