Open MrPippin66 opened 3 years ago
Mmm... It's the first time I hear about this. I'm trying to understand how it works (https://unixism.net/2019/08/linux-pressure-stall-information-psi-by-example/). The information in those 3 files is easy to extract. What's more difficult is understanding how to interpret that data and imagine an actual use case. For instance, psutil doc shows an actual use case for psutil.getloadavg()
, showing how to translate those raw numbers to get a percentage of CPU usage/load over time:
>>> import psutil
>>> psutil.getloadavg()
(3.14, 3.89, 4.67)
>>> psutil.cpu_count()
10
>>> # percentage representation over the last 1, 5, 15 mins
>>> [x / psutil.cpu_count() * 100 for x in psutil.getloadavg()]
[31.4, 38.9, 46.7]
If we were to add this I would like to see something similar to provide in the doc: some actual code which does something useful with those raw numbers extracted from /proc/pressure
. But in order to do that I/we'd have to properly understand how this works first. =)
PSI was developed by Facebook. They posted a decent explanation of how they use it, and the benefits it's given them.
https://lwn.net/Articles/759658/
And FYI, that article gives detailed response of the issue with "getloadavg", which goes above the issues we've encountered (namely that you can have several active threads that are active for a small period of their allocated slice. They manifest as high load averages, but overall low CPU utilization).
And swap thrashing isn't the only memory utilization metric that results in low processor throughput, which this facility would include, without having complicated monitoring scripts (high reclaim rates, high faults, etc.)
I think having this available would simplify monitoring,. and hopefully would be used upstream in monitoring products, like "ncpa", etc.
@giampaolo Is this still a feature candidate?
FYI, PSI data is reported in 'sar' data for all current Linux distributions. I think being able to report this data in 'psutil' merits attention.
OS: Linux (kernels at 4.20 or higher, unless vendor has back ported feature) Type: Performance metrics for CPU, Memory & IO
Summary:
https://www.kernel.org/doc/html/latest/accounting/psi.html
Though the is a relative new feature, it will become a common use of information for determining performance issues on systems.
I would requests this information become available (if enabled in OS for psi and/or cgroup) via the psutil framework, primarily so that tools built atop this framework can readily use this for monitoring purposes.
Ultimately, I'd suggest a new category (psi) to gather these values from.
psutil.psi_cpu()
psutil.psi_memory()
psutil.psi_io()
I'd request both the system level and cgroup2 level data be presented for each category.