giampaolo / psutil

Cross-platform lib for process and system monitoring in Python
BSD 3-Clause "New" or "Revised" License
10.31k stars 1.39k forks source link

[Windows 11] cpu_percent affected by cpu_affinity #2251

Open owillebo opened 1 year ago

owillebo commented 1 year ago

Summary

Description

On Windows 7 (32bit) and Windows 10 (64bit) the cpu_percent for a 'while True' loop is 100.0 regardless of affinity

On Windows 11 however, cpu_percent is 0.0 for affinity != Only CPU 0 and 100.0 for affinity == Only CPU 0.

It seems that cpu_percent is only correct when process is scheduled on the first processor.

Reproduce with code below;

from psutil import Process

process = Process()
process.cpu_percent()

# cpu_percent = 0.0 (most of the time) when affinity != Only CPU 0
# cpu_percent = 100.0 when affinity == Only CPU 0
process.cpu_affinity([0])

count = 0
while True:
  count += 1
  if count % 10000000 == 0:
    print(count, process.cpu_percent(), process.cpu_affinity())
owillebo commented 1 year ago

Sysinternals procexp seems to suffer from the same issue.

image

giampaolo commented 1 year ago

I'm not sure I understand why you did this:

while True:
  count += 1
  if count % 10000000 == 0:
    print(count, process.cpu_percent(), process.cpu_affinity())

As also mentioned in psutil doc, there should be some "sleep" time between cpu_percent() calls to get a meaningful result. Do you get reasonable values if you add time.sleep(1) at the end of the while loop?

owillebo commented 1 year ago

I'm not sure I understand why you did this:

while True:
  count += 1
  if count % 10000000 == 0:
    print(count, process.cpu_percent(), process.cpu_affinity())

As also mentioned in psutil doc, there should be some "sleep" time between cpu_percent() calls to get a meaningful result. Do you get reasonable values if you add time.sleep(1) at the end of the while loop?

cpu_percent is only called when count is a multitude of 10000000. On my laptop this means that cpu_percent is called every 0.9 seconds.

giampaolo commented 1 year ago

Ah you did it in order to make the CPU percent increases before requesting the value. Understood.

To summarize: if you do cpu_affinity([0]) you see 100% usage. If you do cpu_affinity([1]) you see 0% usage?

owillebo commented 1 year ago

Ah you did it in order to make the CPU percent increases before requesting the value. Understood.

To summarize: if you do cpu_affinity([0]) you see 100% usage. If you do cpu_affinity([1]) you see 0% usage?

Correct, and in addition to that;

For any combination of CPUs not holding CPU 0, the usage is 0%. If the combination holds CPU 0 and not only CPU 0 the usage is 0% most of the time.

giampaolo commented 1 year ago

It seems I cannot reproduce it. I have a (virtualized) Windows 11 system with 3 CPUs. I tried your script with [0], [1], and [1,2] combinations and each time CPU usage was close to 100%. If you modify the script like this, what does it print? Do CPU timings change?

import psutil

process = psutil.Process()
process.cpu_percent()
process.cpu_affinity([0])
count = 0
while True:
    count += 1
    if count % 10000000 == 0:
        print(count, process.cpu_times(), psutil.cpu_times(), process.cpu_affinity())
owillebo commented 1 year ago

I also print process.cpu_percent(). Please find the output below;

10000000 pcputimes(user=1.234375, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1028.828125, system=571.0937500000291, idle=168171.04687499997, interrupt=45.390625, dpc=39.203125) [0] 100.0
20000000 pcputimes(user=2.46875, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1030.0625, system=571.0937500000291, idle=168184.62499999997, interrupt=45.390625, dpc=39.203125) [0] 99.9
30000000 pcputimes(user=3.703125, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1031.296875, system=571.1093750000291, idle=168198.37499999997, interrupt=45.390625, dpc=39.21875) [0] 98.8
40000000 pcputimes(user=4.9375, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1032.53125, system=571.1093750000291, idle=168211.95312499997, interrupt=45.390625, dpc=39.21875) [0] 100.0
50000000 pcputimes(user=6.15625, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1033.75, system=571.1093750000291, idle=168225.35937499997, interrupt=45.390625, dpc=39.21875) [0] 100.0
60000000 pcputimes(user=7.375, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1034.96875, system=571.1093750000291, idle=168238.76562499997, interrupt=45.390625, dpc=39.21875) [0] 100.1
70000000 pcputimes(user=8.578125, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1036.171875, system=571.1093750000291, idle=168251.99999999997, interrupt=45.390625, dpc=39.21875) [0] 99.9
80000000 pcputimes(user=9.8125, system=0.015625, children_user=0.0, children_system=0.0) scputimes(user=1037.40625, system=571.1250000000291, idle=168265.74999999997, interrupt=45.390625, dpc=39.21875) [0] 100.0
90000000 pcputimes(user=11.03125, system=0.015625, children_user=0.0, children_system=0.0) scputimes(user=1038.625, system=571.1250000000291, idle=168279.15624999997, interrupt=45.390625, dpc=39.21875) [0] 100.1
100000000 pcputimes(user=12.265625, system=0.015625, children_user=0.0, children_system=0.0) scputimes(user=1039.859375, system=571.1250000000291, idle=168292.73437499997, interrupt=45.390625, dpc=39.21875) [0] 99.9
110000000 pcputimes(user=13.5, system=0.015625, children_user=0.0, children_system=0.0) scputimes(user=1041.09375, system=571.1406250000291, idle=168306.48437499997, interrupt=45.40625, dpc=39.21875) [0] 98.8
120000000 pcputimes(user=14.734375, system=0.015625, children_user=0.0, children_system=0.0) scputimes(user=1042.328125, system=571.1406250000291, idle=168320.06249999997, interrupt=45.40625, dpc=39.21875) [0] 100.0

10000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.0000000000291, idle=169587.87499999997, interrupt=45.734375, dpc=39.96875) [1] 0.0
20000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.0156250000291, idle=169601.09374999997, interrupt=45.734375, dpc=39.984375) [1] 0.0
30000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.0312500000291, idle=169614.14062499997, interrupt=45.734375, dpc=40.0) [1] 0.0
40000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.0312500000291, idle=169627.20312499997, interrupt=45.734375, dpc=40.0) [1] 0.0
50000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.0312500000291, idle=169640.43749999997, interrupt=45.734375, dpc=40.0) [1] 0.0
60000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.03125, idle=169653.5, interrupt=45.734375, dpc=40.0) [1] 0.0
70000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.046875, idle=169673.875, interrupt=45.734375, dpc=40.0) [1] 0.0
80000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.046875, idle=169687.28125, interrupt=45.734375, dpc=40.0) [1] 0.0
90000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.046875, idle=169700.6875, interrupt=45.734375, dpc=40.0) [1] 0.0
100000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.046875, idle=169713.75, interrupt=45.734375, dpc=40.0) [1] 0.0
110000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.046875, idle=169727.09375, interrupt=45.734375, dpc=40.0) [1] 0.0
120000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.046875, idle=169746.4375, interrupt=45.734375, dpc=40.0) [1] 0.0
130000000 pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1043.84375, system=573.046875, idle=169759.84375, interrupt=45.734375, dpc=40.0) [1] 0.0
10000000 pcputimes(user=0.328125, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1044.625, system=574.1250000000291, idle=170531.40624999997, interrupt=45.96875, dpc=40.359375) [0, 1] 26.9
20000000 pcputimes(user=1.296875, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1045.59375, system=574.1250000000291, idle=170545.24999999997, interrupt=45.96875, dpc=40.359375) [0, 1] 78.5
30000000 pcputimes(user=1.328125, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1045.625, system=574.1250000000291, idle=170558.48437499997, interrupt=45.96875, dpc=40.359375) [0, 1] 2.6
40000000 pcputimes(user=1.328125, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1045.625, system=574.1250000000291, idle=170571.54687499997, interrupt=45.96875, dpc=40.359375) [0, 1] 0.0
50000000 pcputimes(user=2.390625, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1046.6875, system=574.1250000000291, idle=170587.65624999997, interrupt=45.96875, dpc=40.359375) [0, 1] 86.1
60000000 pcputimes(user=3.03125, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1047.328125, system=574.1250000000291, idle=170600.71874999997, interrupt=45.96875, dpc=40.359375) [0, 1] 53.9
70000000 pcputimes(user=3.625, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1047.921875, system=574.1250000000291, idle=170615.29687499997, interrupt=45.96875, dpc=40.359375) [0, 1] 48.7
80000000 pcputimes(user=3.625, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1047.921875, system=574.1250000000291, idle=170628.35937499997, interrupt=45.96875, dpc=40.359375) [0, 1] 0.0
90000000 pcputimes(user=4.234375, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1048.53125, system=574.1250000000291, idle=170643.37499999997, interrupt=45.96875, dpc=40.359375) [0, 1] 50.7
100000000 pcputimes(user=5.09375, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1049.390625, system=574.1250000000291, idle=170656.78124999997, interrupt=45.96875, dpc=40.359375) [0, 1] 70.5
110000000 pcputimes(user=5.09375, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1049.390625, system=574.1250000000291, idle=170670.01562499997, interrupt=45.96875, dpc=40.359375) [0, 1] 0.0
120000000 pcputimes(user=5.140625, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1049.4375, system=574.1250000000291, idle=170685.96874999997, interrupt=45.96875, dpc=40.359375) [0, 1] 3.9
130000000 pcputimes(user=5.59375, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1049.890625, system=574.1562500000291, idle=170700.29687499997, interrupt=45.96875, dpc=40.390625) [0, 1] 36.7
140000000 pcputimes(user=6.828125, system=0.0, children_user=0.0, children_system=0.0) scputimes(user=1051.125, system=574.1718750000291, idle=170713.87499999997, interrupt=45.984375, dpc=40.390625) [0, 1] 98.8
owillebo commented 1 year ago

It seems I cannot reproduce it. I have a (virtualized) Windows 11 system with 3 CPUs. I tried your script with [0], [1], and [1,2] combinations and each time CPU usage was close to 100%. If you modify the script like this, what does it print? Do CPU timings change?

import psutil

process = psutil.Process()
process.cpu_percent()
process.cpu_affinity([0])
count = 0
while True:
    count += 1
    if count % 10000000 == 0:
        print(count, process.cpu_times(), psutil.cpu_times(), process.cpu_affinity())

Did you try Windows 11 22H2 (released oct 2022)?

Discussion below seems to suggest problems with "cpu times" for version 22H2;

https://forums.guru3d.com/threads/msi-ab-rtss-development-news-thread.412822/page-179

owillebo commented 1 year ago

More info;

https://learn.microsoft.com/en-us/answers/questions/1183193/totalprocessortime-differs-between-windows-10-and

owillebo commented 1 year ago

Code below shows that this is caused by a bug in Windows 11 22H2.

#include <string> 
#include <iostream>
#include <Windows.h>

#define NSEC (10000000)

int main(int argc, char* argv[])
{
    ULARGE_INTEGER LastTime;
    ::GetSystemTimePreciseAsFileTime((FILETIME*)&LastTime);
    HANDLE handle = ::GetCurrentProcess();
    if (argc > 1)
    {
        DWORD_PTR ProcessAffinityMask;
        ProcessAffinityMask = std::stoull(argv[1U]);
        ::SetProcessAffinityMask(handle, ProcessAffinityMask);
    }
    ULARGE_INTEGER CurrentTime;
    ULARGE_INTEGER CreationTime;
    ULARGE_INTEGER ExitTime;
    ULARGE_INTEGER KernelTime;
    ULARGE_INTEGER UserTime;
    unsigned long long count = 0;
    while (true)
    {
        count += 1;
        if (count % 1000000000 == 0ULL)
        {
            ::GetProcessTimes(
                handle,
                (FILETIME*)&CreationTime,
                (FILETIME*)&ExitTime,
                (FILETIME*)&KernelTime,
                (FILETIME*)&UserTime);
            ::GetSystemTimePreciseAsFileTime((FILETIME*)&CurrentTime);
            std::cout << (double)(KernelTime.QuadPart + UserTime.QuadPart) / (double)(CurrentTime.QuadPart - LastTime.QuadPart) * 100.0 << std::endl;
        }
    }
}
owillebo commented 1 year ago

KB5026446 did not fix it :-(

RobertHenry6bev commented 1 year ago

I have a similar problem. Running every 4 seconds: sutil.cpu_percent(interval=delta_t, percpu=False) will often return a float X which is ~6. Then maybe 5% of the time the call will return 100.0. There's no middle ground. My system has been loaded so that N cores are running a compute-only cycle soaker.

If I run psutil.cpu_percent(interval=delta_t, percpu=True) (every 4 seconds, interleaved with above) and take the arithmetic mean of the resulting vector, the mean taken as the overall system CPU % * nproc is very close to N, as expected, and that's also what TaskExplorer is reporting.

I'm running this on an 80 core ARM64 machine (Ampere), using an internal Windows 11 build 25336 from 230404. (I work for Microsoft.)

I have not tried this on an AMD64 (x86_64) machine.

I have not tried it on a machine with <= 64 logical processors (virtual cpus), either AMD64 or ARM64.

I can believe that CPU affinitization may be to blame, as when I started printing out the percpu=True vector alternating with the percpu=False scalar, I would notice that core 0 was pegged at 100% (and ~N-1 others too), and percpu=False reported 100.0

I am running Python 3.11.3, psutils 5.9.5 (as delivered by pip), compiled on the system that's running this test, with yesterday's community visualstudio

RobertHenry6bev commented 1 year ago

My little python program which only calls psutil.cpu_percent with an interval of 2 seconds, running for hours, occasionally receives an OSError exception:

OSError [WinError 1359] An internal error occurred

WinError 1359 is documented as just that: an internal error. Presumably in the kernel...

giampaolo commented 1 year ago

That would be a first. Please paste the full traceback.

RobertHenry6bev commented 1 year ago

Do you want a python stacktrace from the thrown Python OSError, or do you want a stack trace that drills through the C code making up the psutil package and the python runtime?

RobertHenry6bev commented 1 year ago

This is just the python stack. I am not running under a debugger that can view into the compiled C code in the library

Traceback (most recent call last): File "C:\Users\PSE/git-work/robhenry-perf/dotnet_bench/psutiltop.py", line 42, in run_monitor cum_cpu_percent = psutil.cpu_percent( ^^^^^^^^^^^^^^^^^^^ File "C:\Users\PSE\AppData\local\programs\Python\Python311-arm64\Lib\site-packages\psutil__init__.py", line 1757, in cpu_percent _last_cpu_times = cpu_times() ^^^^^^^^^^^ File "C:\Users\PSE\AppData\local\programs\Python\Python311-arm64\Lib\site-packages\psutil__init__.py", line 1618, in cpu_times return _psplatform.cpu_times() ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PSE\AppData\local\programs\Python\Python311-arm64\Lib\site-packages\psutil_pswindows.py", line 299, in cpu_times user, system, idle = cext.cpu_times() ^^^^^^^^^^^^^^^^ OSError: [WinError 1359] An internal error occurred