Closed exsver closed 2 years ago
@simondeziel any thoughts or preference on this?
We want to represent seconds for consistency with the rest of the ecosystem in general (see https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#units-and-base-units and https://www.robustperception.io/who-wants-seconds/).
I checked the code (https://github.com/lxc/lxd/blob/ca3e42ea5e119a1ad8213e133627d334f6e12562/lxd/cgroup/abstraction.go#L261-L344) and the 10^9 division we are doing gives us seconds if I followed properly.
In the v1 case, https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/cpuacct.html mentions that nanoseconds are used for cpuacct.usage
. We use cpuacct.usage_all
but they seem to use the same unit (nanoseconds).
In the v2 case, https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files says microseconds are used everywhere and the fields we are using even have _usec
in their name. https://github.com/lxc/lxd/blob/ca3e42ea5e119a1ad8213e133627d334f6e12562/lxd/cgroup/abstraction.go#L327 converts those microseconds to nanoseconds.
tl;dr: we are dealing with seconds so we should fix the HELP
text.
To verify all this, here's the top
usage sorted by CPU time of one of my container:
root@puppet:~# top
top - 13:10:29 up 9 days, 23:20, 0 users, load average: 0.00, 0.00, 0.00
Tasks: 67 total, 1 running, 66 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 99.7 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
MiB Mem : 2560.0 total, 653.6 free, 1684.6 used, 221.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 875.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
204 puppet 20 0 3265492 1.1g 28656 S 0.0 43.1 159:16.09 java
198 puppetdb 20 0 2870628 486576 23756 S 0.3 18.6 20:04.12 java
1 root 20 0 170228 8708 5244 S 0.0 0.3 0:50.66 systemd
110210 postgres 20 0 215692 21600 19552 S 0.0 0.8 0:05.36 postgres
110216 postgres 20 0 70192 4460 2284 S 0.0 0.2 0:02.86 postgres
110215 postgres 20 0 216364 6112 3772 S 0.0 0.2 0:01.43 postgres
126 systemd+ 20 0 19068 5020 4104 S 0.0 0.2 0:01.26 systemd-network
110214 postgres 20 0 215692 8808 6740 S 0.0 0.3 0:01.22 postgres
110212 postgres 20 0 215816 49560 47416 S 0.0 1.9 0:00.95 postgres
127 systemd+ 20 0 24608 9240 5128 S 0.0 0.4 0:00.94 systemd-resolve
59 root 20 0 35736 6796 5860 S 0.0 0.3 0:00.77 systemd-journal
110213 postgres 20 0 215692 7260 5188 S 0.0 0.3 0:00.61 postgres
118 syslog 20 0 154688 2584 2068 S 0.0 0.1 0:00.23 rsyslogd
121372 postgres 20 0 218712 19408 15092 S 0.0 0.7 0:00.07 postgres
121369 postgres 20 0 218168 16868 13280 S 0.0 0.6 0:00.05 postgres
110217 postgres 20 0 216120 5052 2796 S 0.0 0.2 0:00.03 postgres
121668 root 20 0 9220 2716 2128 R 0.0 0.1 0:00.03 top
That's roughly ~180 minutes of CPU which amounts to ~10800 seconds of user
time. This is roughly what the metrics API gives me:
$ lxc query -X GET --wait xeon:/1.0/metrics | grep seconds_total | grep puppet
lxd_cpu_seconds_total{cpu="0",mode="system",name="puppet",project="default",type="container"} 1231
lxd_cpu_seconds_total{cpu="0",mode="user",name="puppet",project="default",type="container"} 10903
@tomponline my only question/concern is why are those always integer?
@simondeziel you want them as floats?
@tomponline yes, float64 is what is stored by Prometheus (https://prometheus.io/docs/concepts/data_model/#samples). It can deal with int but that's leaving precision on the table.
@exsver is this something you would be willing to look at?
For cpu_time in seconds, it would be better to use floats.
float64(stats.System / 1000000000)
-> float64(stats.System) / float64(1000000000)
float64(stats.User / 1000000000)
-> float64(stats.User) / float64(1000000000)
Required information
Issue description
lxc query -X GET --wait /1.0/metrics reports metric "lxd_cpu_seconds_total" in seconds, but help says: "The total number of CPU seconds used in milliseconds."
So... error in help here: https://github.com/lxc/lxd/blob/ca3e42ea5e119a1ad8213e133627d334f6e12562/lxd/metrics/types.go#L138 or error in calculation here: https://github.com/lxc/lxd/blob/ca3e42ea5e119a1ad8213e133627d334f6e12562/lxd/instance/drivers/driver_lxc.go#L6921-L6922