facebookincubator / below

A time traveling resource monitor for modern Linux systems
Apache License 2.0
1.01k stars 59 forks source link

CPU pressure is always zero #8205

Open yump opened 10 months ago

yump commented 10 months ago

To reproduce: stress-ng --cpu $(( $(nproc) * 2))

Expected behavior: CPU pressure column in the General tab of the cgroups view reads >0%, because the CPU is oversubscribed.

Actual behavior: CPU pressure column reads 0%.

Looking over on the Pressure tab, we find that the load shows up in the "CPU Some Pressure" column. The kernel documentation says:

CPU full is undefined at the system level, but has been reported since 5.13, so it is set to zero for backward compatibility.

My opinion here is that "CPU Some" should replace "CPU full" on the General tab, and probably "CPU full" should not be collected at all. "CPU Some" is useful for detecting transient CPU saturation that doesn't show up in CPU utilization because the workload is bursty.

brianc118 commented 10 months ago

So this is a question of UI and whether General tab should display Some vs Full pressure for CPU.

My first thought here is it makes sense to keep the General tab displaying "Full" pressure for Mem/CPU/IO as it truly shows when all threads in a cgroup are stalled on a particular resource. "Some" can be misleading, as it can show up when the real issue is in memory or some other resource.

That said, no strong opinion. Let me ask around for what others think.

yump commented 10 months ago

Ah, that was enough of a clue that I was able to figure out that the "all threads in a cgroup stalled on CPU" condition can be produced with either a quota or a heavy CPU load outside the cgroup.

brianc118 commented 10 months ago

@yump do you still think having Some CPU pressure on the General tab is a better idea? If so we can continue the conversation, otherwise I'll close this issue.

yump commented 10 months ago

I do still think so, for the desktop use case anyhow. I don't use quotas, and it's Highly Unusual to have more than one CPU-intensive application active at the same time. (Although, it that did happen, it's quite likely a fault condition.)

dschatzberg commented 10 months ago

@yump I don't feel too strongly on this topic but I'm not convinced showing Some CPU Pressure is better.

The scenario when you'll see Some CPU Pressure but not Full CPU Pressure is if you're saturating the machine (as you did with stress nproc * 2, but this already becomes apparent by just looking at the host CPU util (it should be ~100%). I don't see CPU pressure being terribly meaningful in such a scenario, but it is always visible on the dedicated Pressure Tab.

On the other hand, Full CPU Pressure can show a misconfigured or "too low" cpu limit. To be fair, this is a condition we're more concerned with in datacenter/server world than desktop. But I think it shows a scenario that would be hard to catch at a glance otherwise which is why it's my preference.