Closed GoogleCodeExporter closed 9 years ago
I have the same issue using Google Chrome 5.0.360.0-42309 on openSUSE 11.1
(kernel
2.6.27.45-0.1-default).
third_party/tcmalloc/chromium/src/tcmalloc_linux.cc:373] Attempt to free
invalid
pointer: 0xa556008
Aborted
Original comment by jerzyglowacki
on 29 Mar 2010 at 11:51
This means that perftools is trying to free some memory that was allocated
before
perftools started running. Presumably, it is due to an allocation in the libc
resolv
api or some such.
I think this is just a hazard of using LD_PRELOAD to load tcmalloc, which is
one
reason we don't recommend using that approach. What happens if you link your
process
with -ltcmalloc_minimal, and then run that? -- does it work properly?
This is the same as issue 210, I think.
Original comment by csilv...@gmail.com
on 29 Mar 2010 at 5:32
Just to add, Google Chrome 5.0.307.1 worked properly on openSUSE 11.1 (kernel
2.6.27.42-0.1-default).
Original comment by jerzyglowacki
on 29 Mar 2010 at 10:22
That may have to do with the order of when chrome calls certain libc functions
--
especially if it does so in global constructors. I think LD_PRELOAD is just a
fragile
solution: it may work one day, but a seemingly innocuous change will make it
break the
next.
Original comment by csilv...@gmail.com
on 29 Mar 2010 at 10:57
Are you still seeing this problem? Did you ever try linking the chrome build
with -ltcmalloc_minimal rather than using LD_PRELOAD?
Original comment by csilv...@gmail.com
on 2 Aug 2010 at 4:42
We are seeing this on x86_64 SuSE 11 with a binary built with -ltcmalloc_minimal
Backtrace:
[0x00007F5949C4EC33] abort + 387 (/lib64/libc.so.6)
[0x0000000000AC95EF] ? (splunkd)
[0x0000000000AC97A6] _ZN22TCMalloc_CrashReporter12PrintfAndDieEPKcz + 150 (splunkd)
[0x0000000000AC1F8B] _ZN123_GLOBAL__N__ZN61FLAG__namespace_do_not_use_directly_use_DECLARE_int64_instead43FLAGS_tcmalloc_large_alloc_report_thresholdE11InvalidFreeEPv + 43 (splunkd)
[0x0000000000DDE0F5] tc_free + 453 (splunkd)
[0x00007F5949CFB11D] __res_iclose + 189 (/lib64/libc.so.6)
[0x00007F5949D26244] ? (/lib64/libc.so.6)
[0x00007F5949D261D2] __libc_thread_freeres + 34 (/lib64/libc.so.6)
[0x00007F594B204083] ? (/lib64/libpthread.so.0)
[0x00007F5949CEE11D] clone + 109 (/lib64/libc.so.6)
Linux / dnpcor3splk002 / 2.6.32.13-0.5-default / #1 SMP 2010-07-20 14:12:19 +0200 / x86_64
It seems the class of problem is not specific to LD_PRELOAD, but is likely
specific to non-SP SLES 11.
Original comment by jrod...@gmail.com
on 25 Feb 2011 at 10:24
Incidentally that is with tcmalloc from perftools 1.4
Original comment by jrod...@gmail.com
on 25 Feb 2011 at 10:25
What is the exact link line you made to build the binary? Is it possible
-ltcmalloc_minimal was not last on the link line? That would explain the crash
you're seeing.
Original comment by csilv...@gmail.com
on 25 Feb 2011 at 10:30
Even when tcmalloc is linked in statically? I guess I can see where this is
encouraged in the mailing list some places, but the docs suggest it is needed
for heap profiling, and the source comments suggest it is in the context of
dynamic linking.
If linking in tcmalloc_minimal statically is unsafe if other libraries may come
later, this should be promoted in the docs, I think.
The link line is long, and lpthread follows ltcmalloc_minimal, so we'll invert.
Original comment by jrod...@gmail.com
on 28 Feb 2011 at 9:36
I don't know enough about how linkers work to say what happens when tcmalloc is
linked in statically. Ideally, everything would work ok (and link order
wouldn't matter). I have no idea how close reality is to this ideal. :-}
Original comment by csilv...@gmail.com
on 28 Feb 2011 at 9:54
Changing the link line is not sufficient. Bug remains.
Occurs on SLES 11, SLES 11 SP1, OpenSuSE 11.3.
We can now exercise it on demand, but the on-demand binary isn't publically
available at the moment. Will ask internally.
Original comment by jrod...@gmail.com
on 3 Mar 2011 at 8:59
If you're able to experiment, is it possible to link tcmalloc.so rather than
tcmalloc.a to your application?
The comment 6 backtrace makes it seem like the problem is tcmalloc trying to
free memory that the thread library allocated from libc. Are you linking in
lpthread statically as well, or is that dynamic?
I wonder what could be going on.
Original comment by csilv...@gmail.com
on 3 Mar 2011 at 10:09
OK, I think I found the cause of this. Digging through the SuSE glibc SRPM i
found a patch called "glibc-nss-deepbind.diff". Apparently this has the known
side effect of breaking overridden malloc()/free() and was rejected from glibc
mainline for that reason, but SuSE decided to take the patch anyway. So I
guess any replacement malloc just can not be used on SuSE.
Shame that they didn't at least provide some way of overriding this via an
environment variable or something. Also odd that it doesn't seem to crash
*every* time but I guess that is just up to how the resolv library happens to
allocate per-thread memory
Here is their patch in its entirety:
-------------------- CUT HERE --------------------
Use DEEPBIND to load the nss modules. Helps thunderbird (linked against its
own version of the ldap libs) when using nss_ldap (linked against system
libldap) leading to crashes due to incompatibilities.
This has a downside: Linking against libraries overriding malloc() and free()
will break (unless the malloc()'d pointers by glibc are free()able by these).
This is fixable in principle, just needs some work.
See https://bugzilla.novell.com/show_bug.cgi?id=157078 and
http://sourceware.org/bugzilla/show_bug.cgi?id=6610
Index: nss/nsswitch.c
===================================================================
--- nss/nsswitch.c.orig
+++ nss/nsswitch.c
@@ -361,7 +361,9 @@ __nss_lookup_function (service_user *ni,
".so"),
__nss_shlib_revision);
- ni->library->lib_handle = __libc_dlopen (shlib_name);
+ ni->library->lib_handle
+ = __libc_dlopen_mode (shlib_name,
+ RTLD_LAZY | __RTLD_DLOPEN | RTLD_DEEPBIND);
if (ni->library->lib_handle == NULL)
{
/* Failed to load the library. */
Original comment by mitchbl...@gmail.com
on 4 Mar 2011 at 1:50
Also, that explains why using an IPv6 nameserver would cause the problem to be
more reproducible (at least according to the initial bug report) If you have
an IPv6 nameserver in resolv.conf, then __res_vinit() will do a
malloc(sizeof(struct sockaddr_in6))) to store its address in, but IPv4
addresses can go directly in the res_state structure.
There are also places in res_send.c that also malloc() to put things in that
structure, but I don't think they're IPv6-specific (internally libresolv seems
to be using sockaddr_in6 to hold both IPv4 and IPv6 addresses, so I think it
might be allocating for IPv4 in some cases) Maybe if it has to do a recursive
query? I haven't dug too far in res_send.c to see the exact situations that
would cause it to do an allocation.
Original comment by mitchbl...@gmail.com
on 4 Mar 2011 at 1:58
Nice detective work! Thanks for trackign this down.
It's been a goal of mine to make it so we detect memory allocated by libc
malloc, and pass it to libc free to free. This is doable using dlsym_next, I
think, but that adds an -ldl requirement that I'd rather not do. It may also
be possible using __libc_free, but right now tcmalloc overrides that as well
(due to an ancient redhat 9 bug).
But if you want to try it, you can play around with this -- this might be an
excellent testcase, actually.
I haven't tried it, but I *think* the following will work:
1) in tcmalloc.cc and debugallocation.cc, remove the lines like this:
void* __libc_malloc(size_t size) ALIAS("tc_malloc");
void* __libc_malloc(size_t size) { return malloc(size); }
extern void* __libc_malloc(size_t size);
2) In tcmalloc, edit InvalidFree to just do
__libc_free(ptr);
It would be nice to fix the other two Invalid*() sites too, but there doesn't
seem to be a __libc_malloc_usable_size. Hopefully fixing free() will be enough
to go on.
If you have a chance to give this a try, let us know how it works!
Original comment by csilv...@gmail.com
on 4 Mar 2011 at 2:08
Making a patch and testing. Thanks.
Original comment by jrod...@gmail.com
on 4 Mar 2011 at 2:44
So far, so good. I think the problem is worked around with that change.
We only use tcmalloc_minimal, generally speaking, with all the features turned
off, so went for the basics:
diff -Naur google-perftools-1.4.orig/src/tcmalloc.cc
google-perftools-1.4/src/tcmalloc.cc
--- google-perftools-1.4.orig/src/tcmalloc.cc 2009-09-11 08:59:15.000000000
-0700
+++ google-perftools-1.4/src/tcmalloc.cc 2011-03-03 22:36:21.000000000 -0800
@@ -136,6 +136,8 @@
# define WIN32_DO_PATCHING 1
#endif
+extern "C" void __libc_free(void *);
+
using tcmalloc::PageHeap;
using tcmalloc::PageHeapAllocator;
using tcmalloc::SizeMap;
@@ -308,7 +310,7 @@
// our own implementations of malloc/free, we need to make sure that
// the __libc_XXX variants (defined as part of glibc) also point to
// the same implementations.
-#ifdef __GLIBC__ // only glibc defines __libc_*
+#ifdef DONT_DO_THIS_______GLIBC__ // only glibc defines __libc_*
extern "C" {
#ifdef ALIAS
void* __libc_malloc(size_t size) ALIAS("tc_malloc");
@@ -334,7 +336,7 @@
}
#endif // #ifdef ALIAS
} // extern "C"
-#endif // ifdef __GLIBC__
+#endif // ifdef DONT_DO_THIS_______GLIBC__
#undef ALIAS
@@ -350,7 +352,7 @@
// required) kind of exception handling for these routines.
namespace {
void InvalidFree(void* ptr) {
- CRASH("Attempt to free invalid pointer: %p\n", ptr);
+ __libc_free(ptr);
}
size_t InvalidGetSizeForRealloc(void* old_ptr) {
We ostensibly work on RH9, though I doubt anyone uses it. I suppose that would
result in a similar category of problem?
Original comment by jrod...@gmail.com
on 4 Mar 2011 at 7:44
Great! -- I'm glad the patch fixes things. I've been curious how well it would
work; I didn't know how to write a realistic test for it, so the DEEPBIND thing
is useful for that purpose alone.
The problem with RH9 is described a bit at the bottom of tcmalloc.cc -- it has
to do with some old libc code wrongly using free() instead of __libc_free() in
some (rare) circumstances. If I understand the bug correctly, it might work to
just override
__libc_memalign. I'll see if I can find anyone who remembers the circumstances
behind this bug better than I do.
Original comment by csilv...@gmail.com
on 4 Mar 2011 at 8:59
Sorry, mitch points out RH9 doesn't exist for x86_64, and we don't use tcmalloc
for i386 (we had some problems.. maybe the rh9 ones, and for some historical
reasons our app has drastically superior performance on x86_64 anyway).
Short version, the RH9 issue doesn't matter to the practical problem of this
bug at this time, for us.
I'll be beating on it tomorrow. If there's anything in particular you're
intersted in, or if you would want to see the program (it's a huuuge beast, i
don't recommend much) I can ask.
Original comment by jrod...@gmail.com
on 4 Mar 2011 at 9:31
I've been thinking about this problem some more, and I think the right fix --
one that should work on redhat 9 and for your problem -- is different from the
one I proposed above. Instead, I think we should use the __malloc_hook/etc
functionality.
The way this works, is we'll write a function in tcmalloc that assigns hooks
when possible:
http://www.gnu.org/s/libc/manual/html_node/Hooks-for-Malloc.html
It will look something like this:
static void* mz_malloc(size_t size, const void *caller) { return tc_malloc(size); }
etc // for free, realloc, memalign
void (*__malloc_initialize_hook) (void) = my_init_hook;
static void my_init_hook (void) {
__malloc_hook = mz_malloc;
__free_hook = mz_free;
etc // realloc and memalign
}
We might want to save the old __malloc_hook/etc values as well; not sure yet if
that's necessary.
This totally replaces the above: we *keep* the __libc_* aliasing, etc. But for
code that has DEEPBIND set and calls the libc malloc anyway, these hooks will
take over and hopefully do the right thing.
My worry is that DEEPBIND will affect __malloc_hook/etc, so our override of
these values won't take. I think it should be ok, though. Do you want to give
it a try and see how it works?
Original comment by csilv...@gmail.com
on 4 Mar 2011 at 9:58
I've confirmed that the fix csilver discussed in comment #20 resolves this
issue and submitted a patch.
Stashing away the old hooks appears to not be necessary, and I noted no
performance impact as a result of this change.
Original comment by frobn...@gmail.com
on 9 May 2011 at 3:39
Patch and appropriate release attached.
Original comment by frobn...@gmail.com
on 9 May 2011 at 8:09
Attachments:
In further testing, I've discovered the patch you've given doesn't work when
linking with -static (obsolete, I know, but something we should support).
Instead I'm trying the following patch, which seems to work as well:
---
void* (*__malloc_hook)(size_t, const void*) = &mz_malloc;
void* (*__realloc_hook)(void*, size_t, const void*) = &mz_realloc;
void (*__free_hook)(void*, const void*) = &mz_free;
void* (*__memalign_hook)(size_t,size_t, const void*) = &mz_memalign;
// No malloc_init_hook or __malloc_initialize_hook
---
Can you check whether this works for you as well? If so, I'll get it into the
next release.
Original comment by csilv...@gmail.com
on 1 Jun 2011 at 10:26
The patch you've proposed in #23 works for me.
Original comment by frobn...@gmail.com
on 3 Jun 2011 at 9:28
Great, that's what I'll go with.
Original comment by csilv...@gmail.com
on 4 Jun 2011 at 1:09
This is in perftools 1.8, just released.
Original comment by csilv...@gmail.com
on 16 Jul 2011 at 1:19
Original issue reported on code.google.com by
bjorn.k...@gmail.com
on 29 Mar 2010 at 9:38