Closed sorantis closed 2 years ago
Pinging @elastic/integrations-services (Team:Services)
Looking at the docs for the CPU accounting controller here: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/cpuacct.html
user and system are in USER_HZ unit.
These are the tick
units we reference elsewhere.
So, looking at the original issue, I'm a tad confused by the wording. If we want to map any given tick
usage in cgroups to overall usage, that should be possible, as we're dealing with the same units, and we measure everything based off time deltas. If we're just trying to normalize the per-cpu cgroup metrics, that's possible as well.
@fearful-symmetry Here's the original request:
customer uses lxc as container environment and we want to monitoring them with metricbeat but we need to figure out how we get pct metrics especially for stats user and stats system
I've updated the issue description with more info coming from a support case. This is doable at the module level, we should be good adding this to the next release cycle. Thanks @fearful-symmetry, going to un-assign you and move the issue to our backlog.
@masci / @sorantis / @kesslerm a brief update, since we want this done for 7.13
I spent a good bit of the day digging through how we "normalize" cpu metrics in other parts of metricbeat, since our CPU monitoring code is spread across what feels like 4 different libraries.
So, I'm assuming that we want a "nice" percentage number that's comparable to how we report cpu usage percentages elsewhere.
There's an interesting caveat here I didn't notice earlier--The cgroup itself reports totals in nanoseconds, but user and kernel time in USER_HZ
. We do some math to convert everything to nanoseconds, which is why we get numbers like 15226370000000
compared to 3831401199507
. This means that we're not going to get entirely accurate numbers if we start trying to calculate percentages from user
and system
. Elsewhere, everything is in USER_HZ
, so the math is cleaner.
There's some other caveats here, as the cgroup docs mentions:
cpuacct controller uses percpu_counter interface to collect user and system times. This has two side effects:
It is theoretically possible to see wrong values for user and system times. This is because percpu_counter_read() on 32bit systems isn’t safe against concurrent writes.
It is possible to see slightly outdated values for user and system times due to the batch processing nature of percpu_counter.
So, we might run into issues if we try and make normalized percentages of user
and system
.
Thanks for the analysis @fearful-symmetry. I'm trying to understand whether our approach is to convert user and kernel from the converted nanoseconds to pct, or from the USER_HZ to pct. Will both cases lead inaccurate numbers for pct?
Also, for the two mentioned caveats:
I'm trying to understand whether our approach is to convert user and kernel from the converted nanoseconds to pct, or from the USER_HZ to pct. Will both cases lead inaccurate numbers for pct?
The issue is more that the cpuacct API at /sys/fs/cgroup
reports metrics in two formats. For other CPU usage data, like /proc/stat
and /proc/[PID]/stat
, which is where we get our CPU usage metrics for system/cpu
and /system/process
, the entire sets of metrics are reported in USER_HZ
. Converting these to nanoseconds or percentages isn't an issue, the kernel provides APIs to make the math more reliable (_SC_CLK_TCK
).
The issue is that we already get some metrics in nanoseconds and some not, so I think we're going to get some interesting math. For example, to emulate how system/cpu
does percents using the CPU total, if we calculated user
and system
based on your example above, we'd get 59% and 34%, which doesn't quite add up. We can emulate system/process
instead, and use the nanoseconds between collection intervals as a standin for totals, but I wonder if this will result in some slight discrepancies between the numbers for system
, user
and the numbers for everything else. The latter method is probably better, and we just might want to put some disclaimers in the docs.
For context, I assume the customer used this snapshot of Metricbeat. Correct me if I'm wrong @liladler
@liladler how many process events did the system collect? The very first event that metricbeat collects will have the percentages set to zero, as it needs processes across time to create a percentage. Also, can I get the entire event?
@fearful-symmetry By now the system collected millions of documents, this is a full event -
"event" : {
"dataset" : "system.process",
"duration" : 460504378,
"module" : "system"
},
"env" : "tier1",
"@timestamp" : "2021-04-28T12:32:20.401Z",
"logstash" : {
"tier1" : "flt025547",
"tier2" : "flt031502"
},
"system" : {
"process" : {
"state" : "sleeping",
"cmdline" : "/usr/sbin/nscd",
"cgroup" : {
"blkio" : {
"id" : "lxv1394",
"total" : {
"ios" : 196713,
"bytes" : 1138053120
},
"path" : "/lxc/lxv1394"
},
"id" : "lxv1394",
"cpuacct" : {
"id" : "lxv1394",
"percpu" : {
"5" : 4468439526456,
"1" : 4246731060274,
"4" : 4944152156744,
"6" : 4296676679369,
"3" : 5297998803640,
"2" : 5326331389417
},
"stats" : {
"system" : {
"ns" : 14492870000000,
"pct" : 0.001,
"norm" : {
"pct" : 2.0E-4
}
},
"user" : {
"ns" : 12603240000000,
"pct" : 0.002,
"norm" : {
"pct" : 3.0E-4
}
}
},
"total" : {
"ns" : 28580329615900,
"pct" : 0.004,
"norm" : {
"pct" : 7.0E-4
}
},
"path" : "/lxc/lxv1394"
},
"cpu" : {
"id" : "lxv1394",
"rt" : {
"period" : {
"us" : 1000000
},
"runtime" : {
"us" : 0
}
},
"stats" : {
"periods" : 0,
"throttled" : {
"ns" : 0,
"periods" : 0
}
},
"cfs" : {
"shares" : 1024,
"quota" : {
"us" : 0
},
"period" : {
"us" : 100000
}
},
"path" : "/lxc/lxv1394"
},
"memory" : {
"mem" : {
"failures" : 0,
"limit" : {
"bytes" : 9223372036854771712
},
"usage" : {
"max" : {
"bytes" : 1065435136
},
"bytes" : 992104448
}
},
"stats" : {
"page_faults" : 1200998712,
"unevictable" : {
"bytes" : 0
},
"pages_in" : 225533559,
"inactive_anon" : {
"bytes" : 199987200
},
"hierarchical_memory_limit" : {
"bytes" : 9223372036854771712
},
"active_anon" : {
"bytes" : 478814208
},
"inactive_file" : {
"bytes" : 113795072
},
"rss" : {
"bytes" : 57786368
},
"swap" : {
"bytes" : 0
},
"hierarchical_memsw_limit" : {
"bytes" : 9223372036854771712
},
"rss_huge" : {
"bytes" : 8388608
},
"cache" : {
"bytes" : 934318080
},
"pages_out" : 256972324,
"major_page_faults" : 1496,
"mapped_file" : {
"bytes" : 11005952
},
"active_file" : {
"bytes" : 199507968
}
},
"kmem_tcp" : {
"failures" : 0,
"limit" : {
"bytes" : 9223372036854771712
},
"usage" : {
"max" : {
"bytes" : 0
},
"bytes" : 0
}
},
"kmem" : {
"failures" : 0,
"limit" : {
"bytes" : 9223372036854771712
},
"usage" : {
"max" : {
"bytes" : 0
},
"bytes" : 0
}
},
"id" : "lxv1394",
"memsw" : {
"failures" : 0,
"limit" : {
"bytes" : 9223372036854771712
},
"usage" : {
"max" : {
"bytes" : 1065435136
},
"bytes" : 992104448
}
},
"path" : "/lxc/lxv1394"
},
"path" : "/lxc/lxv1394"
},
"fd" : {
"limit" : {
"soft" : 1024,
"hard" : 4096
},
"open" : 12
},
"cpu" : {
"start_time" : "2021-03-23T12:50:41.000Z",
"total" : {
"value" : 128970,
"pct" : 0,
"norm" : {
"pct" : 0
}
}
},
"memory" : {
"size" : 605614080,
"share" : 1269760,
"rss" : {
"pct" : 1.0E-4,
"bytes" : 1998848
}
}
}
},
"host" : {
"name" : "lx00590"
},
"@version" : "1",
"fields" : {
"elastic_index" : "demo"
},
"tags" : [
"beats_input_raw_event"
],
"metricset" : {
"name" : "process",
"period" : 10000
},
"agent" : {
"hostname" : "lx00590",
"name" : "lx00590",
"id" : "fc1b3dfe-79b8-4608-bdb6-52aadee95b32",
"version" : "7.13.0",
"ephemeral_id" : "7bd1a27f-4962-4ba8-a334-a32bd34ab60e",
"type" : "metricbeat"
},
"process" : {
"state" : "sleeping",
"command_line" : "/usr/sbin/nscd",
"pgid" : 40965,
"args" : [
"/usr/sbin/nscd"
],
"cpu" : {
"start_time" : "2021-03-23T12:50:41.000Z",
"pct" : 0
},
"name" : "nscd",
"ppid" : 40886,
"working_directory" : "/",
"pid" : 40965,
"executable" : "/usr/sbin/nscd",
"memory" : {
"pct" : 1.0E-4
}
},
"user" : {
"name" : "nscd"
},
"ecs" : {
"version" : "1.9.0"
},
"service" : {
"type" : "system"
},
"type" : "metricbeat",
"protocol" : "tcp"
}
@liladler based on what you sent me, I'm having a hard time telling if something is wrong:
"total" : {
"ns" : 28580329615900,
"pct" : 0.004,
"norm" : {
"pct" : 7.0E-4
}
},
A cpuacct usage of 0.4%
for a random background process seems pretty normal. The normalized percentage is a product of the CPU count, as it's "normalized" by the average usage across all CPUs, so we get 0.004/6= ~0.0007
or 0.07%
. Can we try filtering/sorting the processes by CPU usage and seeing if the numbers seem a bit more normal? Alternatively, are any events reporting a usage that's actually 0
?
Considering that all the relevant PRs have been merged, do we want to close this issue?
@liladler has the customer tried the recommendation? Everything seems to be working in order. If there are no further questions from the customer then we'll close the issue.
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)
@fearful-symmetry is this issue still relevant following all the refactors you did?
@jlind23 looks like all the changes have been merged, we should be able to close this.
The customer would like to gather normalized CPU accounting per cgroup (e.g. for LXC containers). Metricbeat's CPU accounting can be collected per cgroup, but is then reported in snapshots of nanoseconds of CPU time since the cgroup was started, for example:
Since we do have the total nanoseconds, we can provide the
percpu
values as normalized percentages, similar to how it's done for system module's cpu metricset:We would probably need a similar config option for cgroups, like: