giampaolo / psutil

Cross-platform lib for process and system monitoring in Python
BSD 3-Clause "New" or "Revised" License
10.29k stars 1.38k forks source link

Mapped memory regions used by process #260

Closed giampaolo closed 10 years ago

giampaolo commented 10 years ago

From g.rodola on April 21, 2012 21:59:23

This might be a nice addition, see: 
http://stackoverflow.com/questions/2184775/getting-a-list-of-used-libraries-by-a-running-process-unix
 The way I see it, Process class should grow a new get_shared_libs() method 
returning a list of namedtuples including:

- absolute library path (e.g. /lib/x86_64-linux-gnu/libexpat.so.1)
- address (e.g. 00007f9babf44000)
- mode (e.g. "rw---" or the numeric representation)
- rss (memory resident set size)

Also, if the new method is invoked as such:

>>> get_shared_libs(extended=True)

...we can provide some extra platform-dependent fields such as pss/swap/... 
memory, inode, device no, etc (see "pmap -x PID" output on Linux and "man 
proc", sections /proc/[pid]/maps and /proc/[pid]/smaps).

For the Windows implementation we can look here: 
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682621(v=vs.85).aspx 
Judging from process hacker it seems we can retrieve all the fields listed 
above except "mode".

For BSD / OSX see "man pmap" as they seem the right sys-calls for doing this 
kind of job.

Original issue: http://code.google.com/p/psutil/issues/detail?id=260

giampaolo commented 10 years ago

From wj32...@gmail.com on April 23, 2012 04:59:04

Do you also want WOW64 DLLs on 64-bit Windows systems?
giampaolo commented 10 years ago

From g.rodola on April 23, 2012 05:02:13

I suppose it makes sense, yes, unless it adds too much complexity.
giampaolo commented 10 years ago

From g.rodola on April 23, 2012 10:49:32

Preliminary Linux patch in attachment.

Attachment: linux-sharedlibs.patch

giampaolo commented 10 years ago

From wj32...@gmail.com on April 23, 2012 15:54:22

What's the point of the "mode" field? Different sections of a mapped image are 
going to have different protection bits set.
giampaolo commented 10 years ago

From wj32...@gmail.com on April 24, 2012 03:58:33

Completely untested patch.

Attachment: modules.patch

giampaolo commented 10 years ago

From g.rodola on April 24, 2012 18:58:38

On second thought I think we better off calling this function 
"get_memory_mappings()" since the stuff we're extracting is not related to 
shared libraries only.
On Linux, for example, we have "heap" and "anonymous" maps, which refer to 
application private data, same thing for other OSes, including Windows.
By taking a look at process hacker it seems that process "memory" tab is 
exactly what we're looking for (it also shows what looks like a "mode" column).

wj32.64, thanks for your patch; do you think you can adapt it in order to 
include also non-shared-lib mappings?
In process hacker I see "Private", "Mapped", "Free" and other stuff. That's 
what we're looking for and should be placed in "path" field.
Does that come from MODULEENTRY32 struct? If so I can adapt the patch myself.

Thanks in advance.
giampaolo commented 10 years ago

From wj32...@gmail.com on April 24, 2012 19:08:41

Sorry, I misunderstood the purpose of this feature, seeing as it's called 
"Shared libraries". It's easy to query all memory mappings in an address space, 
but it takes a lot more effort to get a list of heaps, stacks, etc.
giampaolo commented 10 years ago

From g.rodola on April 24, 2012 20:33:51

> Sorry, I misunderstood the purpose of this feature,

It's my fault as at first I wasn't clear on the technical details involved.

> It's easy to query all memory mappings in an address space

Can you provide some more info? Is your patch still a valid starting point?

> it takes a lot more effort to get a list of heaps, stacks, etc.

Nevermind, I can live with that. 
The main reason why this is needed is to provide a more detailed process memory 
usage on Linux, as explained here: 
http://bmaurer.blogspot.it/2006/03/memory-usage-with-smaps.html Maybe that 
doesn't apply to Windows.
giampaolo commented 10 years ago

From wj32...@gmail.com on April 24, 2012 20:56:46

I'll need to rewrite the patch. It's a completely different API to query memory regions.
giampaolo commented 10 years ago

From g.rodola on April 25, 2012 19:08:40

Linux implementation committed as r1301 .
The API I chose consists in get_memory_maps(grouped=True).
Example:

>>> get_memory_maps(grouped=True)
[mmap(path='/lib/x86_64-linux-gnu/libutil-2.13.so', rss=16384, shared=2109440, ...]
[mmap(path='/usr/lib/python2.7/lib-dynload/_heapq.so', rss=16384, 
shared=2109440, ...]
[mmap(path='[heap]', rss=16384, shared=2109440, ...]
...
>>> get_memory_maps(grouped=False)
[mmap(addr='00400000-00633000', perms='r-xp', 
path='/lib/x86_64-linux-gnu/libutil-2.13.so', rss=16384, shared=2109440, ...]
[mmap(addr='00400000-00633000', perms='r-x-', 
path='/usr/lib/python2.7/lib-dynload/_heapq.so', rss=16384, shared=2109440, ...]
[mmap(addr='00400000-00633000', perms='rw--', path='[heap]', rss=16384, 
shared=2109440, ...]
...
>>>

I think there's really no point in returning multiple values for the same 
memory region which are only different in terms of "mode" and "address" by default.
Both "mode" and "address" are very low level details that most people shouldn't 
be interested in, so by default we get rid of them and *group* all the memory 
mappings with the same path.
This article reflect exactly this idea: 
http://people.redhat.com/berrange/olpc/performance/epiphany/

Summary: Mapped memory regions used by process
Status: Started
Labels: Milestone-0.5.0 Progress-1in4

giampaolo commented 10 years ago

From wj32...@gmail.com on April 25, 2012 19:17:09

New patch (don't know if it compiles)

Attachment: memory.patch

giampaolo commented 10 years ago

From g.rodola on April 26, 2012 04:37:34

FreeBSD implementation committed in r1302 .

Labels: -Progress-1in4 Progress-2in4

giampaolo commented 10 years ago

From g.rodola on April 26, 2012 05:20:16

wj32.64, thanks for your patch, but now that I look at what it is able to 
extract it appears quite insufficient if compared to process hacker.
I think we can use your first patch and provide process shared modules only and 
specify this in the doc.
giampaolo commented 10 years ago

From wj32...@gmail.com on April 26, 2012 05:24:53

What do you mean? It should be giving you the same things that PH gives you.
giampaolo commented 10 years ago

From g.rodola on April 26, 2012 06:04:54

Oh right, this line:

if (basicInfo.Type == MEM_MAPPED)

...was preventing the interesting stuff to be shown. =)
giampaolo commented 10 years ago

From g.rodola on April 26, 2012 07:28:49

Windows implementation committed as r1303 .
Thanks a lot wj32.64.

Labels: -Progress-2in4 Progress-3in4

giampaolo commented 10 years ago

From g.rodola on May 10, 2012 10:21:58

Attached is a partial-working OSX implementation.
I was able to retrieve memory addresses and memory consumption but I'm not able 
to determine the *name* of the memory regions.

It seems we can use proc_regionfilename() from /usr/include/libproc.h but I was 
not able to make it work: 
http://prod.lists.apple.com/archives/darwin-kernel/2012/Apr/msg00024.html

Attachment: osx.patch

giampaolo commented 10 years ago

From g.rodola on May 10, 2012 10:28:44

Assigning this one to Jeremy as per his request.

Owner: jcscoob...@gmail.com

giampaolo commented 10 years ago

From jcscoob...@gmail.com on May 25, 2012 15:36:54

Documenting for posterity: 
http://lists.apple.com/archives/darwin-kernel/2007/Jun/msg00056.html
giampaolo commented 10 years ago

From jcscoob...@gmail.com on May 25, 2012 23:51:45

Documenting for posterity (Seems to have some stuff in it that could be used to 
get things working): 
http://www.opensource.apple.com/source/gdb/gdb-413/src/gdb/macosx/macosx-nat-dyld.c
giampaolo commented 10 years ago

From jcscoob...@gmail.com on May 30, 2012 09:46:36

As of last night, here is my progress:

mmap(path='/usr/local/Cellar/mongodb/2.0.4-x86_64/bin/mongod', rss=45355008, 
private=17870848, ref_count=5, shadow_count=1)
mmap(path='/usr/lib/dyld', rss=4938358784, private=249856, ref_count=322, 
shadow_count=1)
mmap(path='COW', rss=15459876864, private=227119104, ref_count=9891, 
shadow_count=21)
mmap(path='ALI', rss=22663168, private=4096, ref_count=96, shadow_count=0)
mmap(path='/private/var/db/dyld/dyld_shared_cache_x86_64', rss=9542041600, 
private=5718016, ref_count=4345, shadow_count=6)
mmap(path='NUL', rss=2196004864, private=0, ref_count=0, shadow_count=0)
mmap(path='PRV', rss=2868117504, private=901120, ref_count=25, shadow_count=0)
mmap(path='???', rss=1612718080, private=36864, ref_count=5, shadow_count=0)

I'm not 100% certain it's right and the path for non-image paths are likely not 
ideal (I stole them from how top displays them).
giampaolo commented 10 years ago

From jcscoob...@gmail.com on May 30, 2012 09:49:11

You'll also notice that the 5th entry is not a dylib but is instead a path to 
the DYLD shared cache.  I'm looking at how to get the real path.
giampaolo commented 10 years ago

From jcscoob...@gmail.com on May 30, 2012 11:07:32

I'm attaching an updated (not finished) patch.  It makes a full round trip so 
now what is required are these:

* Make sure the memory numbers/counts are accurate
* Figure out a way to get the real dylib from the dyld_shared_cache
* Make sure the path for non image paths is correct

Attachment: psutil_Issue_260_osx_v2.diff

giampaolo commented 10 years ago

From jcscoob...@gmail.com on May 30, 2012 11:33:22

Looks like to get the dylib name from the dyld_shared_cache, we'll need to 
write a parser for the dyld_shared_cache_*.map file and then lookup the dylib 
filename based on the memory address for the region.
giampaolo commented 10 years ago

From g.rodola on June 03, 2012 15:44:01

Issue 96 has been merged into this issue.
giampaolo commented 10 years ago

From g.rodola on June 20, 2012 07:05:33

I took a look at this and verified that RSS memory and address matches vmmap (yay!).
There's a problem with some paths though.

Here:

p = psutil.Process(os.getpid())
for m in p.get_memory_maps(0):
    if not m.path.startswith('[') and not os.path.exists(m.path):
        print "%s %s %s %s" % (m.addr, m.rss / 1024, m.perms, repr(m.path))

...it prints:

bash-3.2$ python foo.py 
0000000000000000-0000000000001000 0 ---/--- '/usr/local/bin/python2.7p\xbc\xe7'
0000000000383000-00000000003d0000 248 r--/rwx 
'/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation\xe1\xff\xbf\xc6\x10\x0c'

I have no idea where this comes from.
Maybe here?

            // reset the char[]s to avoid weird paths
            memset(buf, '\0', strlen(buf));

An updated patch is in attachment.
What I did in my patch is:

- comment out SM_LARGE_PAGE as it's not avialable on OSX 10.5
- set the correct C types in Py_BuildValue ("sssIIIIIH")
- return info about dirtied and swapped memory counters

Attachment: osx-mem-maps.patch

giampaolo commented 10 years ago

From jcscoob...@gmail.com on June 25, 2012 15:29:56

Attaching a new version of the patch that takes g.rodola's last patch and fixes 
the paths.  The issue described in Comment #25 was not addressed in this patch. 
 I think if g.rodola says the new patch creates the correct paths, we should 
commit as-is and create an enhancement to actually find the real dylib 
names/paths by parsing the dyld_shared_cache_*.map file as an enhancement.

Attachment: psutil_Issue_260_osx_v4.diff

giampaolo commented 10 years ago

From g.rodola on June 25, 2012 15:39:06

Tested on my OSX bos and it looks ok.
Please commit the patch and thanks for the great help.
giampaolo commented 10 years ago

From jcscoob...@gmail.com on June 25, 2012 15:54:07

As of r1360 , OS X support has been committed to trunk.  I will create a new 
issue as an enhancement to finish the work described in comment #25.  Assigning 
to g.rodola.

Owner: g.rodola
Labels: -Progress-3in4 Progress-4in4

giampaolo commented 10 years ago

From g.rodola on June 25, 2012 15:57:48

Status: FixedInSVN

giampaolo commented 10 years ago

From g.rodola on June 27, 2012 11:54:03

0.5.0 is finally out. Closing out as fixed.

Status: Fixed

giampaolo commented 10 years ago

From g.rodola on March 02, 2013 04:07:27

Updated csets after the SVN -> Mercurial migration: r1301 == revision 
56b8a40141ef r1302 == revision a162934acf6f r1303 == revision ed40e10ea16e 
r1360 == revision 580cd7be5f24