Open Extiward opened 3 weeks ago
Internally the code uses NtQuerySystemInformation
https://github.com/giampaolo/psutil/blob/7cae974b9baa669f3ce738f5cd02458cd0d8c7d9/psutil/arch/windows/cpu.c#L103-L105
Unfortunately that function's documentation says
[NtQuerySystemInformation may be altered or unavailable in future versions of Windows. Applications should use the alternate functions listed in this topic.]
Of course the alternate function is completely wrong, it is the one that only gives System times:
Use GetSystemTimes instead to retrieve this information.
I've seen other functions changing behavior in Windows 11.
This code should probably be switched to use performance counters ("Processor Information").
When using cpu_percent with percpu=False to display CPU load the value is always much lower than expected, e.g. cpu_percent returns load or single digit percent, while CPU actually is loaded to e.g. 50-70% (when looking at Task Manager). When using percpu=True only one element [...]
According to this description, both cpu_percent(percpu=False)
and cpu_percent(percpu=True)
return incorrect values (@Extiward am I correct?).
Note: internally cpu_percent(percpu=False)
relies on GetSystemTimes. Differently from NtQuerySystemInformation, MS doc does not officially discourage it or deprecate it. It even says:
On a multiprocessor system, the values returned are the sum of the designated times across all processors.
So are we sure GetSystemTimes
is at fault here? It's an old and well established Windows API.
For reference, here's the links to psutil implementation
ChatGPT seems to confirm GetSystemTimes
is basically deprecated on modern system:
Q: is it true that GetSystemTimes no longer returns accurate results on recent windows versions, and instead I should use performance counters
Yes, this is accurate to an extent. On recent versions of Windows, starting with Windows 8 and Windows Server 2012, the behavior of the GetSystemTimes function changed due to improvements in the way the operating system tracks CPU usage, particularly on modern hardware with dynamic clock speeds (e.g., Turbo Boost, power-saving features).
Modern CPUs adjust their clock speeds dynamically based on workload and power management policies. GetSystemTimes relies on tick-based counters, which can become inconsistent when the clock speed changes.
The precision of the timers used internally by GetSystemTimes may not account for all variations in CPU usage, especially on systems with energy-saving features enabled.
Scaling Issues: On systems with multiple cores or hyper-threading, the reported CPU times may not fully align with actual performance or workload distribution.
It's unfortunate I have to apprehend this from AI instead of MS doc. :-\
If this is true, it may indeed make sense to calculate system CPU times by using perf counters. I remember you Daniel (@dbwiddis) did something similar: you replaced a native Windows API with performance counters for swap_memory()
in #2160. Perhaps that suggests perf counters should also be used elsewhere, not only in swap and CPU functions (sigh!).
There seems to be one problem: according to code (e.g. see here and here) some performance counters may be disabled and fail. As such, we should probably ship a dual implementation: try perf counters first else use Windows native API.
And still unsolved, since we're discussing 2 problems here: it's not clear how to replace NtQuerySystemInformation
to collect per-CPU metrics.
If this is true, it may indeed make sense to calculate system CPU times by using perf counters. I remember you Daniel (@dbwiddis) did something similar: you replaced a native Windows API with performance counters for
swap_memory()
in #2160. Perhaps that suggests perf counters should also be used elsewhere, not only in swap and CPU functions (sigh!).
Yes, that's generally what I've done over on the Java/JNA side.
There seems to be one problem: according to code (e.g. see here and here) some performance counters may be disabled and fail. As such, we should probably ship a dual implementation: try perf counters first else use Windows native API.
Having navigated through the range of associated problems over the years and implemented multiple fallbacks, yes, "it's complicated". Here are some of the obstacles:
In both of the above cases, it may be possible to use a WMI table to fetch the counters from the same source without using the PDH functions. It can be slower (COM overhead) but typically works as a backup.
When they're disabled, you can't do anything, WMI doesn't even work as a backup. Just say so in an error message; however, allow for configuration to minimize log messages in that case. :)
And still unsolved, since we're discussing 2 problems here: it's not clear how to replace
NtQuerySystemInformation
to collect per-CPU metrics.
That's the "Processor Information" performance counters. Here's the Corresponding WMI Table (it's the 'formatted' one that gives usage metrics you'd expect, the 'raw' data gives "ticks").
Note "Processor Information" is processor-group aware but is Windows 7+. There is a similar "Processor" performance counter that can be used pre-Win7, but it is not processor-group aware.
Also note "Processor Information" can give you "real" tick counts, but then your users will complain that you don't match the Task Manager output, so you'll need a configuration option to choose whether to use the "Utility" counters rather than the "Percent" counters.
That's a lot to chew on. Let's see what I can do. In the meantime... thanks as always. =) The above info is very useful.
Summary
Description
When using cpu_percent with percpu=False to display CPU load the value is always much lower than expected, e.g. cpu_percent returns load or single digit percent, while CPU actually is loaded to e.g. 50-70% (when looking at Task Manager). When using percpu=True only one element in the array contains large number (the high load element seems to change from run to run), which roughly corresponds to the full CPU utilization (see output example below). CPU has 12 cores and 24 threads.
Code snippet:
Example output: CPU load: [0.0, 0.0, 1.6, 3.1, 0.0, 3.1, 0.0, 4.7, 0.0, 0.0, 0.0, 1.6, 0.0, 4.7, 1.6, 0.0, 1.6, 3.1, 3.1, 0.0, 0.0, 3.1, 1.6, 42.4]% CPU load: [3.1, 3.1, 6.2, 1.6, 0.0, 3.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.6, 0.0, 0.0, 1.6, 0.0, 0.0, 3.1, 1.6, 41.5]% CPU load: [0.0, 1.6, 6.2, 6.2, 0.0, 0.0, 1.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.1, 0.0, 0.0, 0.0, 0.0, 0.0, 70.1]% CPU load: [4.6, 0.0, 3.1, 4.7, 0.0, 1.6, 1.6, 1.6, 1.6, 1.6, 4.7, 3.1, 0.0, 3.1, 10.9, 3.1, 0.0, 4.7, 3.1, 10.9, 1.6, 0.0, 3.1, 50.0]% CPU load: [0.0, 0.0, 0.0, 6.3, 0.0, 0.0, 1.6, 3.1, 0.0, 0.0, 3.1, 0.0, 0.0, 3.1, 3.1, 1.6, 1.6, 3.1, 0.0, 3.1, 0.0, 1.6, 0.0, 35.4]%
That can't be correct behavior. Expected result would be to have roughly even load across all cores as seen in the attached screenshot.