facebookincubator / below

A time traveling resource monitor for modern Linux systems
Apache License 2.0
1.08k stars 61 forks source link

fb_procfs: Can't tell which `Stat::cpus` entry matches to which CPU when CPUs are hot[un]plugged #8190

Closed htejun closed 1 year ago

htejun commented 1 year ago

Stat::cpus is Option<Vec<CpuStat>> which is populated by each cpuN line in /proc/stat. However, /proc/stat CPU numbers may have holes. Here's an example from a 4 CPU qemu instance:

# cat /proc/stat
cpu  113613 21 2406 5547556 641 0 144 0 0 0
cpu0 26133 18 1309 2725972 370 0 134 0 0 0
cpu1 32707 2 422 2709984 266 0 2 0 0 0
cpu2 28312 0 587 67 0 0 7 0 0 0
cpu3 26460 0 86 111532 5 0 0 0 0 0

After taking cpu2 offline with echo 0 > /sys/devices/system/cpu/cpu2/online, the file looks as follows:

# cat /proc/stat
cpu  113618 21 2412 8207884 686 0 145 0 0 0
cpu0 26136 18 1312 2733593 371 0 135 0 0 0
cpu1 32708 2 424 2717609 268 0 2 0 0 0
cpu3 26460 0 87 119161 6 0 0 0 0 0

KernelStats::cpu_time will contain 3 entries but without any way to reliably find out which three CPUs are being reported. Corroborating other sources doesn't really work as there may be intervening hot[un]plug operations between reads.

It seems like KernelStats::cpu_time, instead of being a Vec, should be keyed with CPU ID read from /proc/stat.

brianc118 commented 1 year ago

8193 should fix this. I will try to tag a new version in the next week so you can use it.