google-code-export / psutil

Automatically exported from code.google.com/p/psutil
Other
0 stars 0 forks source link

Mapped memory regions used by process #260

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This might be a nice addition, see:
http://stackoverflow.com/questions/2184775/getting-a-list-of-used-libraries-by-a
-running-process-unix

The way I see it, Process class should grow a new get_shared_libs() method 
returning a list of namedtuples including:

- absolute library path (e.g. /lib/x86_64-linux-gnu/libexpat.so.1)
- address (e.g. 00007f9babf44000)
- mode (e.g. "rw---" or the numeric representation)
- rss (memory resident set size)

Also, if the new method is invoked as such:

>>> get_shared_libs(extended=True)

...we can provide some extra platform-dependent fields such as pss/swap/... 
memory, inode, device no, etc (see "pmap -x PID" output on Linux and "man 
proc", sections /proc/[pid]/maps and /proc/[pid]/smaps).

For the Windows implementation we can look here: 
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682621(v=vs.85).aspx
Judging from process hacker it seems we can retrieve all the fields listed 
above except "mode".

For BSD / OSX see "man pmap" as they seem the right sys-calls for doing this 
kind of job.

Original issue reported on code.google.com by g.rodola on 21 Apr 2012 at 7:59

GoogleCodeExporter commented 9 years ago
Do you also want WOW64 DLLs on 64-bit Windows systems?

Original comment by wj32...@gmail.com on 23 Apr 2012 at 11:59

GoogleCodeExporter commented 9 years ago
I suppose it makes sense, yes, unless it adds too much complexity.

Original comment by g.rodola on 23 Apr 2012 at 12:02

GoogleCodeExporter commented 9 years ago
Preliminary Linux patch in attachment.

Original comment by g.rodola on 23 Apr 2012 at 5:49

Attachments:

GoogleCodeExporter commented 9 years ago
What's the point of the "mode" field? Different sections of a mapped image are 
going to have different protection bits set.

Original comment by wj32...@gmail.com on 23 Apr 2012 at 10:54

GoogleCodeExporter commented 9 years ago
Completely untested patch.

Original comment by wj32...@gmail.com on 24 Apr 2012 at 10:58

Attachments:

GoogleCodeExporter commented 9 years ago
On second thought I think we better off calling this function 
"get_memory_mappings()" since the stuff we're extracting is not related to 
shared libraries only.
On Linux, for example, we have "heap" and "anonymous" maps, which refer to 
application private data, same thing for other OSes, including Windows.
By taking a look at process hacker it seems that process "memory" tab is 
exactly what we're looking for (it also shows what looks like a "mode" column).

wj32.64, thanks for your patch; do you think you can adapt it in order to 
include also non-shared-lib mappings?
In process hacker I see "Private", "Mapped", "Free" and other stuff. That's 
what we're looking for and should be placed in "path" field.
Does that come from MODULEENTRY32 struct? If so I can adapt the patch myself.

Thanks in advance.

Original comment by g.rodola on 25 Apr 2012 at 1:58

GoogleCodeExporter commented 9 years ago
Sorry, I misunderstood the purpose of this feature, seeing as it's called 
"Shared libraries". It's easy to query all memory mappings in an address space, 
but it takes a lot more effort to get a list of heaps, stacks, etc.

Original comment by wj32...@gmail.com on 25 Apr 2012 at 2:08

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
> Sorry, I misunderstood the purpose of this feature,

It's my fault as at first I wasn't clear on the technical details involved.

> It's easy to query all memory mappings in an address space

Can you provide some more info? Is your patch still a valid starting point?

> it takes a lot more effort to get a list of heaps, stacks, etc.

Nevermind, I can live with that. 
The main reason why this is needed is to provide a more detailed process memory 
usage on Linux, as explained here:
http://bmaurer.blogspot.it/2006/03/memory-usage-with-smaps.html
Maybe that doesn't apply to Windows.

Original comment by g.rodola on 25 Apr 2012 at 3:33

GoogleCodeExporter commented 9 years ago
I'll need to rewrite the patch. It's a completely different API to query memory 
regions.

Original comment by wj32...@gmail.com on 25 Apr 2012 at 3:56

GoogleCodeExporter commented 9 years ago
Linux implementation committed as r1301.
The API I chose consists in get_memory_maps(grouped=True).
Example:

>>> get_memory_maps(grouped=True)
[mmap(path='/lib/x86_64-linux-gnu/libutil-2.13.so', rss=16384, shared=2109440, 
...]
[mmap(path='/usr/lib/python2.7/lib-dynload/_heapq.so', rss=16384, 
shared=2109440, ...]
[mmap(path='[heap]', rss=16384, shared=2109440, ...]
...
>>> get_memory_maps(grouped=False)
[mmap(addr='00400000-00633000', perms='r-xp', 
path='/lib/x86_64-linux-gnu/libutil-2.13.so', rss=16384, shared=2109440, ...]
[mmap(addr='00400000-00633000', perms='r-x-', 
path='/usr/lib/python2.7/lib-dynload/_heapq.so', rss=16384, shared=2109440, ...]
[mmap(addr='00400000-00633000', perms='rw--', path='[heap]', rss=16384, 
shared=2109440, ...]
...
>>>

I think there's really no point in returning multiple values for the same 
memory region which are only different in terms of "mode" and "address" by 
default.
Both "mode" and "address" are very low level details that most people shouldn't 
be interested in, so by default we get rid of them and *group* all the memory 
mappings with the same path.
This article reflect exactly this idea:
http://people.redhat.com/berrange/olpc/performance/epiphany/

Original comment by g.rodola on 26 Apr 2012 at 2:08

GoogleCodeExporter commented 9 years ago
New patch (don't know if it compiles)

Original comment by wj32...@gmail.com on 26 Apr 2012 at 2:17

Attachments:

GoogleCodeExporter commented 9 years ago
FreeBSD implementation committed in r1302.

Original comment by g.rodola on 26 Apr 2012 at 11:37

GoogleCodeExporter commented 9 years ago
wj32.64, thanks for your patch, but now that I look at what it is able to 
extract it appears quite insufficient if compared to process hacker.
I think we can use your first patch and provide process shared modules only and 
specify this in the doc.

Original comment by g.rodola on 26 Apr 2012 at 12:20

GoogleCodeExporter commented 9 years ago
What do you mean? It should be giving you the same things that PH gives you.

Original comment by wj32...@gmail.com on 26 Apr 2012 at 12:24

GoogleCodeExporter commented 9 years ago
Oh right, this line:

if (basicInfo.Type == MEM_MAPPED)

...was preventing the interesting stuff to be shown. =)

Original comment by g.rodola on 26 Apr 2012 at 1:04

GoogleCodeExporter commented 9 years ago
Windows implementation committed as r1303.
Thanks a lot wj32.64.

Original comment by g.rodola on 26 Apr 2012 at 2:28

GoogleCodeExporter commented 9 years ago
Attached is a partial-working OSX implementation.
I was able to retrieve memory addresses and memory consumption but I'm not able 
to determine the *name* of the memory regions.

It seems we can use proc_regionfilename() from /usr/include/libproc.h but I was 
not able to make it work:
http://prod.lists.apple.com/archives/darwin-kernel/2012/Apr/msg00024.html

Original comment by g.rodola on 10 May 2012 at 5:21

Attachments:

GoogleCodeExporter commented 9 years ago
Assigning this one to Jeremy as per his request.

Original comment by g.rodola on 10 May 2012 at 5:28

GoogleCodeExporter commented 9 years ago
Documenting for posterity: 
http://lists.apple.com/archives/darwin-kernel/2007/Jun/msg00056.html

Original comment by jcscoob...@gmail.com on 25 May 2012 at 10:36

GoogleCodeExporter commented 9 years ago
Documenting for posterity (Seems to have some stuff in it that could be used to 
get things working): 
http://www.opensource.apple.com/source/gdb/gdb-413/src/gdb/macosx/macosx-nat-dyl
d.c

Original comment by jcscoob...@gmail.com on 26 May 2012 at 6:51

GoogleCodeExporter commented 9 years ago
As of last night, here is my progress:

mmap(path='/usr/local/Cellar/mongodb/2.0.4-x86_64/bin/mongod', rss=45355008, 
private=17870848, ref_count=5, shadow_count=1)
mmap(path='/usr/lib/dyld', rss=4938358784, private=249856, ref_count=322, 
shadow_count=1)
mmap(path='COW', rss=15459876864, private=227119104, ref_count=9891, 
shadow_count=21)
mmap(path='ALI', rss=22663168, private=4096, ref_count=96, shadow_count=0)
mmap(path='/private/var/db/dyld/dyld_shared_cache_x86_64', rss=9542041600, 
private=5718016, ref_count=4345, shadow_count=6)
mmap(path='NUL', rss=2196004864, private=0, ref_count=0, shadow_count=0)
mmap(path='PRV', rss=2868117504, private=901120, ref_count=25, shadow_count=0)
mmap(path='???', rss=1612718080, private=36864, ref_count=5, shadow_count=0)

I'm not 100% certain it's right and the path for non-image paths are likely not 
ideal (I stole them from how top displays them).

Original comment by jcscoob...@gmail.com on 30 May 2012 at 4:46

GoogleCodeExporter commented 9 years ago
You'll also notice that the 5th entry is not a dylib but is instead a path to 
the DYLD shared cache.  I'm looking at how to get the real path.

Original comment by jcscoob...@gmail.com on 30 May 2012 at 4:49

GoogleCodeExporter commented 9 years ago
I'm attaching an updated (not finished) patch.  It makes a full round trip so 
now what is required are these:

* Make sure the memory numbers/counts are accurate
* Figure out a way to get the real dylib from the dyld_shared_cache
* Make sure the path for non image paths is correct

Original comment by jcscoob...@gmail.com on 30 May 2012 at 6:07

Attachments:

GoogleCodeExporter commented 9 years ago
Looks like to get the dylib name from the dyld_shared_cache, we'll need to 
write a parser for the dyld_shared_cache_*.map file and then lookup the dylib 
filename based on the memory address for the region.

Original comment by jcscoob...@gmail.com on 30 May 2012 at 6:33

GoogleCodeExporter commented 9 years ago
Issue 96 has been merged into this issue.

Original comment by g.rodola on 3 Jun 2012 at 10:44

GoogleCodeExporter commented 9 years ago
I took a look at this and verified that RSS memory and address matches vmmap 
(yay!).
There's a problem with some paths though.

Here:

p = psutil.Process(os.getpid())
for m in p.get_memory_maps(0):
    if not m.path.startswith('[') and not os.path.exists(m.path):
        print "%s %s %s %s" % (m.addr, m.rss / 1024, m.perms, repr(m.path))

...it prints:

bash-3.2$ python foo.py 
0000000000000000-0000000000001000 0 ---/--- '/usr/local/bin/python2.7p\xbc\xe7'
0000000000383000-00000000003d0000 248 r--/rwx 
'/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation\x
e1\xff\xbf\xc6\x10\x0c'

I have no idea where this comes from.
Maybe here?

            // reset the char[]s to avoid weird paths
            memset(buf, '\0', strlen(buf));

An updated patch is in attachment.
What I did in my patch is:

- comment out SM_LARGE_PAGE as it's not avialable on OSX 10.5
- set the correct C types in Py_BuildValue ("sssIIIIIH")
- return info about dirtied and swapped memory counters

Original comment by g.rodola on 20 Jun 2012 at 2:05

Attachments:

GoogleCodeExporter commented 9 years ago
Attaching a new version of the patch that takes g.rodola's last patch and fixes 
the paths.  The issue described in Comment #25 was not addressed in this patch. 
 I think if g.rodola says the new patch creates the correct paths, we should 
commit as-is and create an enhancement to actually find the real dylib 
names/paths by parsing the dyld_shared_cache_*.map file as an enhancement.

Original comment by jcscoob...@gmail.com on 25 Jun 2012 at 10:29

Attachments:

GoogleCodeExporter commented 9 years ago
Tested on my OSX bos and it looks ok.
Please commit the patch and thanks for the great help.

Original comment by g.rodola on 25 Jun 2012 at 10:39

GoogleCodeExporter commented 9 years ago
As of r1360, OS X support has been committed to trunk.  I will create a new 
issue as an enhancement to finish the work described in comment #25.  Assigning 
to g.rodola.

Original comment by jcscoob...@gmail.com on 25 Jun 2012 at 10:54

GoogleCodeExporter commented 9 years ago

Original comment by g.rodola on 25 Jun 2012 at 10:57

GoogleCodeExporter commented 9 years ago
0.5.0 is finally out. Closing out as fixed.

Original comment by g.rodola on 27 Jun 2012 at 6:54

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Updated csets after the SVN -> Mercurial migration:
r1301 == revision 56b8a40141ef
r1302 == revision a162934acf6f
r1303 == revision ed40e10ea16e
r1360 == revision 580cd7be5f24

Original comment by g.rodola on 2 Mar 2013 at 12:07