Open aieri opened 7 months ago
FTR. since #102 the node-exporter metrics set the instance label to the FQDN of the host
@aieri does #102 solve the issue for you or shall we still add the feature you're asking for? :)
@lucabello #102 helps, but isn't directly related to my request, because it only affects node-exporter metrics. My proposal is about injecting a hostname label to metrics coming from principals related to grafana-agent via the cos-agent relation.
Here's a concrete example. hardware-observer bundles smartctl_exporter. We don't control the upstream project so we don't have direct influence on the metrics it produces. One of the metrics from this exporter may look like the following once it lands into COS:
smartctl_devices{instance="localhost:10201", job="hardware-observer_2_default", juju_application="hardware-observer", juju_model="cou", juju_model_uuid="77ca69e6-dcda-4c31-8dc0-adb4ad632233", juju_unit="hardware-observer/0"} = 3
But which host is this metric about? Yeah, it's from hardware-observer/0
, but that's useless to me if I'm getting an alert about a broken drive.
Additionally, being able to rely on hostnames would allow me to create overview dashboards that correlate entries from multiple exporters.
[for the record, a long time ago we had conversations about the legacy prometheus charm injecting fqdns via reverse dns (or something along those lines) and COS not supporting it (which I agree is the right choice). While hostnames (especially cat /etc/hostname
) are not universally unique, for all intents and purposes they often are within a deployment, so having grafana-agent inject them as a label could be a decent low tech workaround, I think]
Enhancement Proposal
Operators of Juju environments have several ways to refer to charm "targets" (i.e. Ubuntu machines, LXDs, VMs, and containers): machine IDs, hostnames, and unit names (principals and subordinates). Depending on the use case, one signifier may be preferable to another when identifying a metric source.
When monitoring OpenStack hypervisors, we have at our disposal metrics generated on the hypervisor itself by the libvirt exporter (bundled in either https://charmhub.io/prometheus-libvirt-exporter or https://charmhub.io/openstack-hypervisor), and metrics provided by the OpenStack API about the hypervisors (these come from either https://charmhub.io/openstack-exporter or https://charmhub.io/openstack-exporter-k8s). Since both types of metrics ultimately represent aspects of the same entity, it is useful to display them in the same dashboard; unfortunately we are however unable to correlate them since the former set declares its source to be a juju unit, while the latter one identifies the source via a hostname - obviously the OpenStack API has no knowledge of Juju and charms.
Additionally, metrics displayed by COS may be consumed by users of the infrastructure who - like the OpenStack API - are oblivious to Juju unit names.
As a way to solve the correlation problem mentioned above, as well as to provide a meaningful filtering identifier to individuals who aren't Juju operators, it would be very useful if the grafana-agent machine charm injected the content of
/etc/hostname
in a hostname label for any metric generated by charms related to it.