Suggestion: track serving time memory leaks

GoogleCodeExporter commented 9 years ago

Many times, a server leaks memory by bloating global data structure, like
hash maps, unnecessarily. It is very difficult to identify such cases in
unit tests, because the unit test will delete the data structure before
exiting. It is also a reasonable trade off, to cache some data in memory,
as long as the cache is limited in size.

Typical memory tracking tools cannot easily identify this scenario. There
is, however, one approach that worked very well in other companies. It is a
little bit complicated to explain, but I'll try.

The basic idea is that when you have a transaction, some memory is
allocated and free, and some memory is allocated and cached. If you run a
server for a long time under some load, it is likely that cached data will
be either available from previous transaction, therefore not allocated at
all, or that it be overwritten by newly cached data over time. A common
memory leak is when data is not overwritten in data structure, therefore
bloating the the data structure.
To identify this scenario, what you need is this:
1) Let the server run for some time under load (warm up)
2) start marking all allocated memory 
3) Let the server run for some time under load
4) stop marking allocated memory, but keep tracking deleted memory
5) wait for relatively long time, to allow caches to be reused
6) dump all the marked memory that was not deleted

It looks like we have most of the components to implement this, except that
we currently can only stop tracking memory, instead of pausing.

Original issue reported on code.google.com by ybenisr...@gmail.com on 18 Apr 2008 at 4:53

GoogleCodeExporter commented 9 years ago

By "pausing", do you mean step 4, where you keep tracking deleted memory?  
You're
right we don't have anything like that.

I think what you're suggesting can be done by taking two heap-profile snapshots 
and
comparing them, since the snapshots have all sorts of information about where 
memory
was allocated.  But I'm not certain.  Feel free to play around with that.  If it
doesn't work out, and you'd like to patch the code to do what you describe 
here, we'd
be very glad to take a look at it!

Original comment by csilv...@gmail.com on 18 Apr 2008 at 8:51

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

Reading this over again, I think the --base flag to pprof is what you want.  
Just
dump a heap-profile after the server has warmed up, and then dump another
heap-profile periodically, and compare it with --base=warmup.prof.  That should 
show
what you need to know.

I'm closing the bug, but if this doesn't serve your needs, let me know and we 
can
reopen it.

Original comment by csilv...@gmail.com on 6 Mar 2009 at 5:56

Changed state: NotABug

casseopea2 / gperftools

Suggestion: track serving time memory leaks #56