davideuler / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Need API to Free memory back to OS #275

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Our application runs on 64-bit SUSE Linux Enterprise Server 10.

The application allocates around 2 - 3 GB of data for specific calls.. 

We are linking to TCMALLOC version 1.5 and we see that memory keeps growing and 
is not returned back to OS.

We would like to have a way to return unused memory to OS at periodic 
intervals. Process cannot be recycled because it is a production system and we 
are communicating to other live applications.

We tried ReleaseFreeMemory() but it it not reducing the heap size. Is there any 
way to return memory back to OS?

Regards
Sundari.

Original issue reported on code.google.com by sunda...@gmail.com on 11 Oct 2010 at 10:08

GoogleCodeExporter commented 9 years ago
When you say it's using a lot of memory, do you mean virtual memory or physical 
memory?

Has there been any actual problem due to the memory growing?  Or just something 
you're concerned about?

The best way to figure out what's going on is to put in a call to 
MallocExtension::instance()->GetStats(), and print out the resulting buffer.  
This will tell us where tcmalloc thinks the memory is.

Original comment by csilv...@gmail.com on 11 Oct 2010 at 8:21

GoogleCodeExporter commented 9 years ago
Let me explain a little more.

We have multiple instances of the same process running on the Linux server. Our 
server has 32 GB RAM.

Each process in turn reads large files of GB size. When we read these large 
files, we anyway cannot avoid the huge memory allocation. So the process ends 
up allocation around 1 GB physical memory and 2 GB of virtual memory (a typical 
example)

Once we are done with the necessary operation, this process doesn't really have 
to occupy this much memory.. 

We need this because we have other processes that need the physical memory / 
virtual memory. Not all processes are active at the same time.. But we expect 
the process to give up memory once it is done so that memory can be used by 
other processes.

We have typically have anywhere between 6 to 8 processes.. each can consume 
average of 3 GB per read loop. 

What we have observed is that due to GPT, we are not able to give up memory and 
the process starts using more memory.. So 3GB becomes 6 and higher and thereby 
the other processes just eat up the swap and hang our systems sometimes.

I will run GetStats() and get an output for further discussion.

Sundari.

Original comment by sunda...@gmail.com on 12 Oct 2010 at 2:31

GoogleCodeExporter commented 9 years ago
Most of the bytes are in unmapped page heap after ReleaseFreeMemory call. Given 
below is the output of GetStats() before and after ReleaseFreeMemory call.

Initial Heap Size : 47448064

 Stats : ------------------------------------------------
MALLOC:     47448064 (   45.2 MB) Heap size
MALLOC:      2246192 (    2.1 MB) Bytes in use by application
MALLOC:     32825344 (   31.3 MB) Bytes free in page heap
MALLOC:     10260480 (    9.8 MB) Bytes unmapped in page heap
MALLOC:       939456 (    0.9 MB) Bytes free in central cache
MALLOC:            0 (    0.0 MB) Bytes free in transfer cache
MALLOC:      1176592 (    1.1 MB) Bytes free in thread caches
MALLOC:          998              Spans in use
MALLOC:            1              Thread heaps in use
MALLOC:      5373952 (    5.1 MB) Metadata allocated
MALLOC:         4096              Tcmalloc page size
------------------------------------------------
 Stats After ReleaseFreeMemory: ------------------------------------------------
MALLOC:     47448064 (   45.2 MB) Heap size
MALLOC:      2246192 (    2.1 MB) Bytes in use by application
MALLOC:            0 (    0.0 MB) Bytes free in page heap
MALLOC:     43085824 (   41.1 MB) Bytes unmapped in page heap
MALLOC:       939456 (    0.9 MB) Bytes free in central cache
MALLOC:            0 (    0.0 MB) Bytes free in transfer cache
MALLOC:      1176592 (    1.1 MB) Bytes free in thread caches
MALLOC:          997              Spans in use
MALLOC:            1              Thread heaps in use
MALLOC:      5373952 (    5.1 MB) Metadata allocated
MALLOC:         4096              Tcmalloc page size
------------------------------------------------
Heap Size After ReleaseFreeMemory: 47448064

Original comment by sunda...@gmail.com on 12 Oct 2010 at 2:54

GoogleCodeExporter commented 9 years ago
Using lots of virtual memory shouldn't be a problem: I think you have 48 bits 
of virtual memory on an x86_64 system?  Or maybe 47?  That's plenty.

Physical memory could be an issue.  However, GetStats is showing that the heap 
size is the same before and after ReleaseFreeMemory.  So it looksl ike it's 
doing what it ought to.  tcmalloc thinks the app has less than 100 M of memory 
mapped.  Is top (or ps, or whatever you're using) showing more?

One issue could be overhead due to sampling.  In tcamlloc 1.6, I've changed the 
default to not sample at all by default.  Do you want to try upgrading to 
tcmalloc 1.6, and see if this fixes the problems you're seeing?  You could also 
just try running with the environment variable TCMALLOC_SAMPLE_PARAMETER=0.

Original comment by csilv...@gmail.com on 12 Oct 2010 at 10:12

GoogleCodeExporter commented 9 years ago
Thanks for the analysis. I am using version 1.6. Whatever I shared is from 1.6 
and my problem is I want to release the heap because I don't need it in the 
process!

Sundari.

Original comment by sunda...@gmail.com on 13 Oct 2010 at 4:54

GoogleCodeExporter commented 9 years ago
Unmapped bytes *are* released.  We'll clean up this wording in the next release 
to make it clearer -- I admit it's really confusing right now.  These are bytes 
that have been released to the OS (via an madvise call).  The stats you're 
showing me indicate everything is working like it should.

Just to be clear, when you said you were using tcmalloc 1.5 at the top of this 
bug report, that was a typo?  You're actually using 1.6?

} What we have observed is that due to GPT, we are not able to give up memory 
and
} the process starts using more memory.. So 3GB becomes 6 and higher and 
thereby the
} other processes just eat up the swap and hang our systems sometimes.

Are you certain that's what's happening: processes are swapping because of the 
memory demands of the binaries?  Or is that just a hypothesis you have right 
now?  I want to be clear, because from what I'm seeing, that shouldn't be 
happening.

Original comment by csilv...@gmail.com on 13 Oct 2010 at 5:27

GoogleCodeExporter commented 9 years ago
One possibility is the madvise() is failing for you, so the bytes aren't 
actually being returned to the system properly.  You can test this by looking 
at src/system_alloc.cc, at the madvise call.  Right now we ignore return 
values, but you can look if it's -1 (and not EAGAIN), and maybe print something 
out then.  If you see that printout when you're running, then that's an 
interesting tidbit.

Original comment by csilv...@gmail.com on 13 Oct 2010 at 5:48

GoogleCodeExporter commented 9 years ago
Answering the thread for both your responses.

Our software is using GPT 1.5 but I happened to create a utility to simulate 
this issue. The utility uses 1.6. I forgot to mention that in my posts. Sorry 
about it.

Today we did some more characterization using the 1.6 version and the utility.

We verified that madvise() call is indeed returning 0. That shows there are no 
failures.

We also profiled a real large data allocation and free.

We were always looking at the HEAP SIZE value of GPT and did not pay attention 
to top output. Today we tried to correlate both.

Here is what we observed. HEAP SIZE we see in GPT is almost near VIRT value 
reported in top for the process.

Our problem is we see a LARGE value of VIRTUAL memory the process is holding on 
to after release call.. Is there a way to free that?

Given below is the data :

Memory allocated in the process :

Top output :
-------------------------------
Virt : 1240m
RES : 1.1g

GPT output :
--------------------------------

Heap Size : 1267204096

 Stats : ------------------------------------------------
MALLOC:   1267204096 ( 1208.5 MB) Heap size
MALLOC:      2263680 (    2.2 MB) Bytes in use by application
MALLOC:   1260503040 ( 1202.1 MB) Bytes free in page heap
MALLOC:            0 (    0.0 MB) Bytes unmapped in page heap
MALLOC:      1777056 (    1.7 MB) Bytes free in central cache
MALLOC:        62464 (    0.1 MB) Bytes free in transfer cache
MALLOC:      2597856 (    2.5 MB) Bytes free in thread caches
MALLOC:         2205              Spans in use
MALLOC:            1              Thread heaps in use
MALLOC:     11141120 (   10.6 MB) Metadata allocated
MALLOC:         4096              Tcmalloc page size
------------------------------------------------

AFTER CALLING RELEASEFREEMEMORY CALL

TOP OUTPUT
------------------------------------------------
REST : 19m
Virt : 1240m

MALLOC:   1267204096 ( 1208.5 MB) Heap size
MALLOC:      2263680 (    2.2 MB) Bytes in use by application
MALLOC:            0 (    0.0 MB) Bytes free in page heap
MALLOC:   1260503040 ( 1202.1 MB) Bytes unmapped in page heap
MALLOC:      1777056 (    1.7 MB) Bytes free in central cache
MALLOC:        62464 (    0.1 MB) Bytes free in transfer cache
MALLOC:      2597856 (    2.5 MB) Bytes free in thread caches
MALLOC:         2205              Spans in use
MALLOC:            1              Thread heaps in use
MALLOC:     11141120 (   10.6 MB) Metadata allocated
MALLOC:         4096              Tcmalloc page size

We see that RES memory did come down after release call. We also want to 
release the 1 GB VIRT memory that is currently being used up by the process.

Regards
Sundari.

Original comment by sunda...@gmail.com on 13 Oct 2010 at 9:05

GoogleCodeExporter commented 9 years ago
OK, sounds like things are working as they ought.  Even when we release the 
memory back to the system, it stays in our virtual address space for accounting 
purposes by the kernel.  However, no physical memory is used, and it should not 
cause any problems.

Are you actually seeing problems in practice (with tcmalloc 1.6)?  Or are you 
just seeing these big numbers and being concerned?  If you are seeing problems, 
what problems are you seeing, precisely?

Original comment by csilv...@gmail.com on 13 Oct 2010 at 7:20

GoogleCodeExporter commented 9 years ago
Thanks Silver. All problems we saw were with GPT 1.4. We have not upgraded to 
GPT 1.6 as yet.

We started this exercise because one of our Linux servers, that ran 8 processes 
hung because of running out of swap space.

Our Application uses GPT 1.4. And we also don't call ReleaseFreeMemory(). We 
started this exercise to see if there are ways to reduce the memory foot print 
per process.

Initially we were not even sure where the problem was (whether there were 
memory leaks)

As these are production systems, GPT upgrade might not be possible immeidately. 
We want to keep the version to 1.4 if possible for this product family.

I ran the same process with GPT 1.4 version, I don't see the mapped and 
unmapped bytes after free. GetStats() in GPT 1.4, just reports Free bytes in 
heap. But the memory definitely goes down similar to GPT 1.6.

I will introduce the ReleaseFreeMemory() call in our application so that we 
give back memory to OS.

One question I still have is bytes remaining in VIRTUAL ADDRESS SPACE. Our 
servers have 6 GB of SWAP allocated. They have 32 GB RAM.

If we run 8 processes and each process reserves 1 GB of SWAP space, will we run 
out of swap? I would like to understand the implications of this scenario.

ReleaseFreeMemory() seems to solve the issue with respect to physical memory. 

Thanks a ton for your immediate response and support! Greatly appreciated!

Regards
Sundari

Original comment by sunda...@gmail.com on 14 Oct 2010 at 8:37

GoogleCodeExporter commented 9 years ago
Just to be clear, virtual memory is not the same as swap.  Assuming you're on a 
64-bit machine, you have (I think) 64000 gigabytes of virtual memory, so you're 
not likely to be running out of it.

The tcmalloc stats report virtual memory use (which is what userspace typically 
gets to see).  The stuff in 'unmapped in page heap' is definitely *not* taking 
physical memory.  If you're seeing lots of physical memory being used, it must 
be from the other numbers.

} I will introduce the ReleaseFreeMemory() call in our application so that we 
give back memory to OS.

That's a good idea.  We should emphasize that more in the docs.  I'll try to 
figure out the right wording.

} As these are production systems, GPT upgrade might not be possible 
immeidately. We want to keep the version to 1.4 if possible for this product 
family.

That should be fine.  You can try setting the environment variable 
TCMALLOC_SAMPLE_PARAMETER=0 before running your program, and see if that helps.

Original comment by csilv...@gmail.com on 14 Oct 2010 at 9:29

GoogleCodeExporter commented 9 years ago
Closing this bug -- I don't think tcmalloc is doing anything wrong here.  The 
wording of the memory-use message has been improved since perftools 1.6, to 
make it clearer that virtual memory use isn't causing any problems.

I suspect the sampling is what's really causing issues here, since it doesn't 
show up in the tcmalloc memory use output.  Since we turn off sampling by 
default in the latest perftools, that could be considered resolved now too. :-)

Original comment by csilv...@gmail.com on 1 Sep 2011 at 1:53