tcmalloc crashes when small objects are used to override available address space

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Allocate all available memory
2. Try to fix the problem after malloc() returns NULL -> release some
memory back to free area
3. Repeat 1,2
4. tcmalloc will crash if 1. uses small blocks (1kB). tcmalloc will survive
if 1. allocates large blocks (1MB).

What is the expected output? What do you see instead?
Expected output:
releasing backup block..
releasing backup block..
releasing backup block..
All backup blocks released, exiting..

I get instead:
src/central_freelist.cc:273] allocation failed: 12
releasing backup block..
src/central_freelist.cc:273] allocation failed: 12
releasing backup block..
src/page_heap_allocator.h:66] assertion failed: free_area_ != NULL
Aborted

What version of the product are you using? On what operating system?
google-perftools-1.3
RedHat EL 5.2 (Tikanga)

Please provide any additional information below.
Attached is the test application that I'm using. There is a switch
SMALL_BLOCK that disables or enables the usage of small blocks (causing
tcmalloc to fail, when it's turned on).

Original issue reported on code.google.com by pet.kh...@gmail.com on 24 Mar 2010 at 2:19

Attachments:

tcmalloc_override_test.cpp

GoogleCodeExporter commented 9 years ago

I thought we already had an issue report for this, but I can't find it.  The 
issue is 
that tcmalloc can't get enough memory for some of its internal data structures. 
 You 
see this crash when there's enough memory for the malloc request, but not 
enough 
memory for the malloc request + tcmalloc internal overhead.  That's why you see 
it on 
small malloc requests when you're near the memory limit, but not for large 
malloc 
requests when you're far away from the memory limit.

My feeling is that when you're that close to running out of memory, it's very 
difficult to have the program work properly (every system call can return 
ENOMEM, 
even those one normally never thinks to check the error code of).  So it's low 
priority for me to fix it, since I don't think it helps many real-world apps 
that 
much.  Did you discover this problem in an actual executable, or only in a 
memory 
stress-test?

Original comment by csilv...@gmail.com on 25 Mar 2010 at 5:49

Changed state: Accepted
Added labels: Priority-Medium, Type-Defect

GoogleCodeExporter commented 9 years ago

Craig,
I deeply respect your work and tcmalloc project. However I sense a slight 
prejudice
or a bias when it comes to the out of memory situations. For example the 
opinion that
when there is a OOM situation, it's very difficult to have the program work 
properly.
Objectively, it's not true. When OOM occurs, you can free some not needed 
resources.
It's a thing that just can be done, and in 100% of cases, it allows any 
application
to continue functioning, if only they have some not needed resources that can be
released. This is a fact.

I've solved my current issues by raising the RLIMIT_AS limit instead of freeing
resources - tcmalloc works through OOM situation without observable problems.

Original comment by pet.kh...@gmail.com on 26 Mar 2010 at 12:32

GoogleCodeExporter commented 9 years ago

Thank you for your feedback.  You're right you can free resources when you 
notice an 
OOM.  The problem is detecting the OOM in the first place.  You have to be an 
extraordinarily careful programmer to do that -- lots of libc calls need a 
little bit 
of memory, and if you forget to check errno after just one of them, your 
program is 
likely to break if you constantly are running close to your memory limit.  I'm 
not 
saying it can't be done, I'm just saying it's very difficult, and tcmalloc is 
just 
one of many problems you face.

That said, I'm not defending tcmalloc here -- it's not great it crashes in 
these 
situations.  My point is merely that even if tcmalloc is fixed, you've still 
got a 
very fragile situation, and if you can raise the memory limits (as you did), 
you'll 
be in a much more comfortable situation.

Original comment by csilv...@gmail.com on 26 Mar 2010 at 3:00

GoogleCodeExporter commented 9 years ago

I detect the OOM by wrapping the malloc call by using a linker mechanisms. This
gives, the guarantee, that the OOM will be always detected. It also gives 
complete
control over the situation - you can detect the OOM in any running threads and 
stop
them instantly without allowing them to continue before the situation is 
resolved.

I want to show that this just can be done and there is no reason to feel 
powerless.
The code is simple and can be written in one day. The "OOM-anxiety" is totally
irrational. The only obstacle is: tcmalloc.

If tcmalloc has problem with getting out from OOM, it can request for help from
application. I want to show that the guarantees from tcmalloc do not have to be 
like:

* "You can always, in 100% of cases, get out from OOM without any problems."

They can be little loosen, but DETERMINISTIC (!):

* "You can get out of OOM if you will provide to tcmalloc a buffer of 4096 
bytes each
time a OOM occurs."

This way tcmalloc will share the responsibility over the situation with the
application. A fair contract, isn't it?

Thanks to this tcmalloc will be more enterprise - it will have one undefined 
behavior
less.

Original comment by pet.kh...@gmail.com on 23 Apr 2010 at 2:50

GoogleCodeExporter commented 9 years ago

It's great to see such passion devoted to perftools!  The worst thing that can 
befall 
an opensource project is when nobody cares how it behaves.

For this particular issue, we're happy to accept a patch if you want to write 
something up.

Original comment by csilv...@gmail.com on 23 Apr 2010 at 5:22

GoogleCodeExporter commented 9 years ago

It's not a devotion, it's a dependency.. But let me explain.

I have written that tcmalloc is an obstacle. Well, it is not completely true.
Tcmalloc has problem with getting out from OOM by freeing memory, i.e. when OOM 
is
detected, then releasing memory doesn't always help (like the test app I've 
attached
proves). This is the problem - tcmalloc doesn't accept the memory released. 
Once it
gets into post-OOM state, it won't accept released memory even if the client
application releases virtually all allocated memory.

BUT it will accept a raise of the resource limit (RLIMIT_AS etc.) This is how 
I'm
currently getting out from OOM with 99% success rate.

The reason that I'm writing this post and that I wrote the previous post is to 
awake
an awareness, that the ability to get out of OOM is important, so that the 
current,
somewhat acceptable state will be not worsen. I'm using it in one of my 
product. A
key debugging feature depends on it. So it's not a devotion, it's a care of my
product.. :) But this is maybe more important?

Original comment by pet.kh...@gmail.com on 23 Apr 2010 at 11:51

GoogleCodeExporter commented 9 years ago

It's been over a year, so I'm going to close this WillNotFix.  If anyone wants 
to write up a patch, feel free to reopen this issue.

Original comment by csilv...@gmail.com on 31 Aug 2011 at 11:35

Changed state: WontFix

GoogleCodeExporter commented 9 years ago

With TcMalloc you can release physical memory (by ReleaseMemory or 
TCMALLOC_MEMORY_RELEASE_RATE), but you cannot release virtual memory (address 
space).

So RLIMIT_AS will not work well.
Instead, I would suggest using the physical memory limit maintained by TcMalloc 
itself (as OS is unable to do it well).

I have provided the patch in issue 448 
(http://code.google.com/p/gperftools/issues/detail?id=448).

Original comment by pafi...@gmail.com on 9 Oct 2012 at 8:11

caohaiwd / gperftools

tcmalloc crashes when small objects are used to override available address space #225