GuillaumeGomez / sysinfo

Cross-platform library to fetch system information
MIT License
2.17k stars 320 forks source link

Failure to get CPU usage on Windows #1381

Closed kathoum closed 2 weeks ago

kathoum commented 2 weeks ago

Describe the bug sysinfo 0.32.0 on Windows 10

Rarely, the first call to refresh CPU usage fails because of an error in Windows Performance Counters (error code is PDH_CSTATUS_NO_MACHINE or PDH_CSTATUS_NO_OBJECT). This happens if another program (typically, an installer) is updating the registry keys related to performance counters.

When it happens, all following attempts to refresh CPU usage will also fail, and the error does not go away even if the program builds a new sysinfo::System object.

To Reproduce Sample code:

// cargo add sysinfo --features debug
// cargo add windows-sys --features Win32_System_Performance

fn main() {
    let mut system = sysinfo::System::new();
    system.refresh_cpu_usage();
    let ncpu = system.cpus().len();
    let usage = system.global_cpu_usage();
    println!("startup: cpus={} usage={}%", ncpu, usage);
    if usage == 0.0 {
        for _ in 0..5 {
            std::thread::sleep(std::time::Duration::from_secs(1));
            system.refresh_cpu_usage();
            println!("now: {}%", system.global_cpu_usage());
        }

        std::thread::sleep(std::time::Duration::from_secs(1));
        system = sysinfo::System::new();
        println!("after building new System: usage: {}%", system.global_cpu_usage());

        for _ in 0..5 {
            std::thread::sleep(std::time::Duration::from_secs(1));
            system.refresh_cpu_usage();
            println!("now: {}%", system.global_cpu_usage());
        }

        unsafe {
            use std::ptr::{null, null_mut};
            use windows_sys::Win32::System::Performance::*;
            PdhEnumObjectsW(null(), null(), null_mut(), &mut 0, PERF_DETAIL_NOVICE, 1);
        }

        std::thread::sleep(std::time::Duration::from_secs(1));
        system = sysinfo::System::new();
        println!("after force-reloading performance counters: usage: {}%", system.global_cpu_usage());

        for _ in 0..5 {
            std::thread::sleep(std::time::Duration::from_secs(1));
            system.refresh_cpu_usage();
            println!("now: {}%", system.global_cpu_usage());
        }
    }
}

While running the test program in a loop (e.g. in powershell run for (;;) { .\target\debug\testinfo.exe }), open an elevated command prompt and run "lodctr /R"

I get the following output:

Query::add_english_counter: failed to add counter 'tot_0': 800007d0...
Query::add_english_counter: failed to add counter '0_0': c0000bb8...
Query::add_english_counter: failed to add counter '1_0': c0000bb8...
Query::add_english_counter: failed to add counter '2_0': c0000bb8...
Query::add_english_counter: failed to add counter '3_0': c0000bb8...
Query::add_english_counter: failed to add counter '4_0': c0000bb8...
Query::add_english_counter: failed to add counter '5_0': c0000bb8...
Query::add_english_counter: failed to add counter '6_0': c0000bb8...
Query::add_english_counter: failed to add counter '7_0': c0000bb8...
failed to refresh CPU data
startup: cpus=8 usage=0%
failed to refresh CPU data
now: 0%
failed to refresh CPU data
now: 0%
failed to refresh CPU data
now: 0%
failed to refresh CPU data
now: 0%
failed to refresh CPU data
now: 0%
after building new System: usage: 0%
Query::add_english_counter: failed to add counter 'tot_0': c0000bb8...
Query::add_english_counter: failed to add counter '0_0': c0000bb8...
Query::add_english_counter: failed to add counter '1_0': c0000bb8...
Query::add_english_counter: failed to add counter '2_0': c0000bb8...
Query::add_english_counter: failed to add counter '3_0': c0000bb8...
Query::add_english_counter: failed to add counter '4_0': c0000bb8...
Query::add_english_counter: failed to add counter '5_0': c0000bb8...
Query::add_english_counter: failed to add counter '6_0': c0000bb8...
Query::add_english_counter: failed to add counter '7_0': c0000bb8...
failed to refresh CPU data
now: 0%
failed to refresh CPU data
now: 0%
failed to refresh CPU data
now: 0%
failed to refresh CPU data
now: 0%
failed to refresh CPU data
now: 0%
after force-reloading performance counters: usage: 0%
Query::get: PdhGetFormattedCounterValue failed
Query::get: PdhGetFormattedCounterValue failed
Query::get: PdhGetFormattedCounterValue failed
Query::get: PdhGetFormattedCounterValue failed
Query::get: PdhGetFormattedCounterValue failed
Query::get: PdhGetFormattedCounterValue failed
Query::get: PdhGetFormattedCounterValue failed
Query::get: PdhGetFormattedCounterValue failed
Query::get: PdhGetFormattedCounterValue failed
now: 100%
now: 8.905312%
now: 17.471794%
...

Analysis I am inclined to say the bug is not in sysinfo, but in Windows's performance counters library.

The issue is that the Pdh library caches the content of the registry key with the counter data, so, if the registry key is broken at program startup, all future queries will fail.

As the test program shows, a possible workaround is to call PdhEnumObjects with bRefresh=TRUE before initializing a new sysinfo::System.

Since the CPU counters are expected to be present in all systems, I suggest that sysinfo adds a call to PdhEnumObjects and builds a new PDH_HQUERY when refreshing CPU usage values, if the initialization of CPU performance counters failed the previous time.

GuillaumeGomez commented 2 weeks ago

Thanks for the detailed explanations! Seems like you already know how to fix it. Wanna send a PR? :)

GuillaumeGomez commented 2 weeks ago

Fixed by #1385.