giampaolo / psutil

Cross-platform lib for process and system monitoring in Python
BSD 3-Clause "New" or "Revised" License
10.25k stars 1.38k forks source link

RFE: return max/peak RSS usage from Process.memory_info #1096

Open sam-s opened 7 years ago

sam-s commented 7 years ago

The peak memory usage is an important parameter to monitor. It is available as the nonstandard ru_maxrss field in struct rusage on most UNIX platforms (although in different units: bytes on BSD, kilobytes on Linux), and you are already returning it from memory_info on Windows as peak_wset. It would be nice if you could add it with a consistent cross-platform name and units (like you already do with rss and vms) to the pmem structure on all platforms. The name should probably be peak_rss and the units should be bytes (for consistency with rss). Thank you!

giampaolo commented 7 years ago

I'm not sure this can be done as getrusage is something which works for the current process only or its children: https://docs.python.org/2/library/resource.html#resource.getrusage ...so it cannot be used for all processes / pids. This information may be stored somewhere else though. E.g. on Linux, I suppose this is it:

cat /proc/self/status  | grep Peak
VmPeak:     7456 kB
sam-s commented 7 years ago

I am afraid I don't understand what you mean by

it cannot be used for all processes / pids

If you want to return the process+children peak memory usage, you can call getrusage twice and add the ru_maxrss fields. This will be an upper bound because process and its children may achieve max RSS usage at different times, but it will be a good start.

My point is that this value (max RSS/peak memory usage &c) is a critically important parameter, and making it available seems like a good idea. It might not be trivial, but it is doable.

giampaolo commented 7 years ago

I am afraid I don't understand what you mean by it cannot be used for all processes / pids

psutil.Process() class can be used with any PID, not only os.getpid() or its children. getrusage is different in this regard because it can be used for os.getpid() only, so we cannot rely on it.

sam-s commented 7 years ago

You mean threads? You can return the value for the owner process.

At any rate, even if the problem is harder than I think, it does not look insurmountable, and googling shows that your users ask for it on SO all the time. :-)

Thank you!

giampaolo commented 7 years ago

You don't understand. Any Process method must be usable with any PID, not only os.getpid(). If you want to push for this functionality you should demonstrate that it is possible to get peak rss for all processes, not only os.getpid(), possibly for multiple UNIX platforms. And BTW, what you ask is already possible in pure python:

>>> import resource
>>> resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
10476
>>> resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss
0
sam-s commented 7 years ago
  1. Okay, I now see that you use os.getpid() to mean self as opposed to "other processes". Sorry about being dense.
  2. For "other processes" we are stuck with the /proc/<pid>/status (which is, yes, platform-specific, and does not work on MacOSX)
  3. resource.getrusage is NFG because it returns kB on Linux and B on BSD (including MacOS).
  4. If one cannot get the peakvm/maxrss value for non-self process on a given platform, it is okay to return None or 0.

Thanks for your patience.

giampaolo commented 7 years ago

Assuming that:

...I think this is not worth it. Returning None or 0 would be a first in psutil API and I don't like that. I suggest you stick with resource.getrusage and convert KB to bytes if you're on Linux.

sam-s commented 7 years ago

The peak memory image can also be determined on Windows. See https://pythonhosted.org/psutil/#psutil.Process.memory_info