canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.39k stars 931 forks source link

Expose LXD daemon metrics #12362

Closed simondeziel closed 7 months ago

simondeziel commented 1 year ago

Related to #12333.

ATM, LXD exposes multiple Go metrics as well as some metrics about itself:

# when NO instances are running
$ lxc query /1.0/metrics | grep ^lxd_ | grep -vF lxd_go_
lxd_operations_total 0
lxd_warnings_total 1
lxd_uptime_seconds 10883.400932528

It would be useful to add some more like those:

# lxc query /1.0 | jq -r .environment.lxc_features
lxd_lxc_features{"cgroup2": "true", "core_scheduling": "true", ...} 1

# lxc query /1.0 | jq -r .environment.kernel_features
lxd_kernel_features{"idmapped_mounts": "true", ...} 1

# lxc query /1.0 | jq -r .environment.driver_version
lxd_driver_versions{"lxc": "5.0.0", "qemu": "8.0.5"} 1

# lxc query /1.0 | jq -r .environment.storage_supported_drivers
lxd_storage_supported_drivers_versions {"btrfs": "5.16.2","ceph": "17.2.6", ..., "zfs": "2.1.9-2ubuntu1.1"} 1

# misc info from lxc query /1.0 | jq -r .environment
lxd_server_info{"certificate_fingerprint": "abc01293...", "clustered": "false", "version": "5.18", "os_name": "Ubuntu", "os_version": "22.04", "firewall": "nftables", "kernel_architecture": "x86_64", "kernel_version": "6.2.0-34-generic" } 1

# New generic metrics
lxd_instances_total 32
lxd_containers_total 17
lxd_vms_total 15
tomponline commented 1 year ago

If we include an entry for each stopped instance do we need the summary metrics for instance counts?

simondeziel commented 1 year ago

If we include an entry for each stopped instance do we need the summary metrics for instance counts?

If metrics are provided for stopped instances, I'd include the summary instance counts as well and even go a step further:

# New generic metrics
lxd_instances_total 32
lxd_containers_total 17
lxd_vms_total 15
lxd_running_instances_total 24
lxd_running_containers_total 13
lxd_running_vms_total 11

That's assuming it's cheap to get those, ofc.

simondeziel commented 7 months ago

The addition of the lxd_instances metric partly addressed this. The remaining part is to expose various version information about what's bundled along with LXD and which features it supports. This should be properly described in a future ticket once properly thought through.