QubitProducts / exporter_exporter

A reverse proxy designed for Prometheus exporters
Apache License 2.0
332 stars 55 forks source link

feat: extended labels #95

Closed thenodon closed 11 months ago

thenodon commented 11 months ago

Inserting labels on metrics based on label name and label value

This PR is maybe out of scope for what you think exporter_exporter is targeted for, but I have a use case where I need to insert additional labels based on existing labels, matching by the label name and label value. This use case is specific to the snmp_exporter, but should be applicable for any exporter that is configured as a proxy.

In this use case we need, for a specific target and interface(s), add additional labels. The source of this information would be a type of discovery, for example a source like a network management system. Just using Prometheus discovery will not be enough since we can only create discovery based labels on the target and not for specific "individuals" that are defined by a label name and label value, like an interface index.

I have called the the additional logic to the exporter_exporter "extended labels". In the expexp.yaml the following 3 additional parameters needs to be configured to achieve this.

defaultModule: snmp
modules:
  snmp:
    method: http
    http:
      port: 9116
      path: /snmp
      label_extend: true
      label_extend_target_url_identity: target
      label_extend_path: /tmp/interfaces.yaml

label_extend set to true just enable the feature for the module. The second , label_extend_target_url_identity, define what target the extended labels should be match against. For the snmp-exporter the url parameter targetdefine the host that snmp should be executed against. The third, label_extend_path, define the path to the extended labels configuration, typical created by some automation.

The configuration, defined by label_extend_path has the following format, using an example that is applicable, but not limited, to the snmp-exporter use case:

extended_labels:
  foo.bar.org:
    labels:
    -
      label_key_name: "ifIndex"
      extended_labels:
        -
          metric_match: ifOperStatus
          match_label_key_values:
            "*":
          default_label_pairs:
            type: l2switch
        -
          metric_match: if(Admin|Oper)Status
          match_label_key_values:
            "370":
              label_pairs:
                 trunk: true
                 endpoint: 172.25.1.19:13
            "371":
              label_pairs:
                trunk: true
                endpoint: 172.25.1.13:45
    -
      label_key_name: "iftype"
      extended_labels:
        - match_label_key_values:
            "mpls":
              label_pairs:
                external: true
  abc.xyz.org:
    labels:
     .......

So extended_labelsinclude a map of targets, in the above example a host name. The entry is matched against the value of the url query parameter defined by label_extend_target_url_identity. So if a metrics has a label named ifIndex, defined by label_key_name, the logic will evaluate against what is configured in the extended_labels. First it will validate if this is only applicable to specific named metrics, for example ifOperStatus. The metric_match value is a regular expression and default to .* if not defined. The match_label_key_values define what label value to match against. In the first entry the match is done against all values, * of ifIndex and a label called type with value l2switch is added. In the second entry, both the metrics named ifOperStatus and ifAdminStatus is matched, but labels, trunk and endpoint, are only added for metrics where ifIndex has the value of 370 or 371.

Most of the logic is in the function getLabelExtendReverseProxyModifyResponseFunc in the http.go file. This function is used instead of getReverseProxyModifyResponseFunc if label_extend is set to true.

I also added watcher.go that manage dynamic reload of the file defined by label_extend_path.

I have not updated README.md, but will if you think this PR is applicable for exporter_exporter.

I can fully understand if this functionality is something you find out of scope for the exporter_exporter. If this is the case I hope it would be okay if I create a separate repo for this with the credits to exporter_exporter and the additional code using the same license as exporter_exporter.

tcolgate commented 11 months ago

In the general case, I'm not hugely convinced of the need for this. You can achieve the same by exposing some data via the node-exporter files plugin that can then be joined with the data scraped from the snmp_exporter at query time. A certain amount may also be achievable prom side via metric relabel rules. On the specifics of the PR, I'm not overly keen on the syntax used. extended_labels feels like it appears in too many places, and having the config in a separate file seems like overkill (if your automation can generate that file, it can generate the expexp config). If I'm missing something, and this can't be achieved by providing info style metrics, then I'd suggest it is more suitable to being it's own tool.

thenodon commented 11 months ago

Hi @tcolgate and thanks for your answer. I have looked into the solutions you mention but I do not see it as an option. This is environment with approx 5000 network devices. Using some method to create relabeling rules for a specific device and its interface indexes would create a prometheus monster config. Using a proxy solution, I think, is the best option since it operates on the data in transit, but I full respect that you think its not in scope for exporter_exporter. You are right that the extended_labels is used twice - that is an easy fix of course, like generate it into a single expexp.yml config file. I just wanted to keep it separated since I did not know your preferences. The info styling might be possible but joining over log time intervals makes the queries vary slow, compared to enrich metrics directly. This also enable that interfaces, in our use cases, can easily be dropped with metric relabeling, since approx 20 % of all interfaces are of interest from a monitoring perspective. I'm totally fine with forking, I just wanted to check first.