aristanetworks / telegraf-cloudvision

Apache License 2.0
5 stars 1 forks source link

Metric names contain duplicated string of the measurement name #3

Closed dhunteratg closed 1 year ago

dhunteratg commented 1 year ago

Having gotten this running with telegraf using the [[outputs.prometheus_client]] method to host a /metrics page, it looks like all of the metric names contain duplicated string:

# HELP _components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant Telegraf collected metric
# TYPE _components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant untyped
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.b2.ams",host_id="SGD2109xxxx",name="CPU0"} 9
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.b2.ams",host_id="SGD2109xxxx",name="CPU1"} 7
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.b2.ams",host_id="SGD2109xxxx",name="CPU3"} 8
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.b3.ams",host_id="SGD2109xxxx",name="CPU0"} 7
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.b3.ams",host_id="SGD2109xxxx",name="CPU1"} 9
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.b4.ams",host_id="SGD2109xxxx",name="CPU0"} 8
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.b4.ams",host_id="SGD2109xxxx",name="CPU1"} 6
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.b4.ams",host_id="SGD2109xxxx",name="CPU2"} 7
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.c4.ams",host_id="SGD2106xxxx",name="CPU0"} 5
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.c4.ams",host_id="SGD2106xxxx",name="CPU1"} 6
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.c4.ams",host_id="SGD2106xxxx",name="CPU2"} 9
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf1.c4.ams",host_id="SGD2106xxxx",name="CPU3"} 7
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b2.ams",host_id="SGD2108xxxx",name="CPU0"} 6
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b2.ams",host_id="SGD2108xxxx",name="CPU1"} 7
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b2.ams",host_id="SGD2108xxxx",name="CPU2"} 7
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b2.ams",host_id="SGD2108xxxx",name="CPU3"} 3
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b3.ams",host_id="SGD2108xxxx",name="CPU0"} 12
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b3.ams",host_id="SGD2108xxxx",name="CPU1"} 6
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b3.ams",host_id="SGD2108xxxx",name="CPU3"} 9
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b4.ams",host_id="SGD2109xxxx",name="CPU1"} 7
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b4.ams",host_id="SGD2109xxxx",name="CPU2"} 8
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.b4.ams",host_id="SGD2109xxxx",name="CPU3"} 4
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.c4.ams",host_id="SGD2108xxxx",name="CPU0"} 6
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.c4.ams",host_id="SGD2108xxxx",name="CPU1"} 5
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.c4.ams",host_id="SGD2108xxxx",name="CPU2"} 8
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="leaf2.c4.ams",host_id="SGD2108xxxx",name="CPU3"} 6
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="spine1.b2.ams",host_id="JPE2207xxxx",name="CPU0"} 14
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="spine1.b2.ams",host_id="JPE2207xxxx",name="CPU1"} 11
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="spine1.b2.ams",host_id="JPE2207xxxx",name="CPU3"} 13
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="spine2.b2.ams",host_id="JPE2208xxxx",name="CPU0"} 4
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="spine2.b2.ams",host_id="JPE2208xxxx",name="CPU1"} 11
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="spine2.b2.ams",host_id="JPE2208xxxx",name="CPU2"} 12
_components_component_cpu_utilization_state_instant__components_component_cpu_utilization_state_instant{host="spine2.b2.ams",host_id="JPE2208xxxx",name="CPU3"} 9

Each of these measurements contain two copies of the string "_components_component_cpu_utilization_state_instant". Is there a way to eliminate the duplication within the measurement names?

Thanks

burnyd commented 1 year ago

Can you provide a sample of your telegraf conf file and eos versions you are using as well as cvp version. Feel free to redact anything you would like. Just so I can give this a test on my end.

I see how its appending the labels together. Is it only for the _components_component_cpu_utilization_state_instant label?

dhunteratg commented 1 year ago

@burnyd This is happening for all labels.

CVP: 2022.3.1 EOS: 4.28.6.1M

telegraf.conf:

# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply surround
# them with ${}. For strings the variable must be within quotes (ie, "${STR_VAR}"),
# for numbers and booleans they should be plain (ie, ${INT_VAR}, ${BOOL_VAR})

# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## Maximum number of unwritten metrics per output.  Increasing this value
  ## allows for longer periods of output downtime without dropping metrics at the
  ## cost of higher maximum memory usage.
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Collection offset is used to shift the collection by the given amount.
  ## This can be be used to avoid many plugins querying constraint devices
  ## at the same time by manually scheduling them in time.
  # collection_offset = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## Collected metrics are rounded to the precision specified. Precision is
  ## specified as an interval with an integer + unit (e.g. 0s, 10ms, 2us, 4s).
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  ##
  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s:
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ##
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  precision = "0s"

  ## Log at debug level.
  # debug = false
  ## Log only error level messages.
  # quiet = false

  ## Log target controls the destination for logs and can be one of "file",
  ## "stderr" or, on Windows, "eventlog".  When set to "file", the output file
  ## is determined by the "logfile" setting.
  # logtarget = "file"

  ## Name of the file to be logged to when using the "file" logtarget.  If set to
  ## the empty string then logs are written to stderr.
  # logfile = ""

  ## The logfile will be rotated after the time interval specified.  When set
  ## to 0 no time based rotation is performed.  Logs are rotated only when
  ## written to, if there is no log activity rotation may be delayed.
  # logfile_rotation_interval = "0h"

  ## The logfile will be rotated when it becomes larger than the specified
  ## size.  When set to 0 no size based rotation is performed.
  # logfile_rotation_max_size = "0MB"

  ## Maximum number of rotated archives to keep, any older logs are deleted.
  ## If set to -1, no archives are removed.
  # logfile_rotation_max_archives = 5

  ## Pick a timezone to use when logging or type 'local' for local time.
  ## Example: America/Chicago
  # log_with_timezone = ""

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false

  ## Method of translating SNMP objects. Can be "netsnmp" (deprecated) which
  ## translates by calling external programs snmptranslate and snmptable,
  ## or "gosmi" which translates using the built-in gosmi library.
  # snmp_translator = "netsnmp"

  ## Name of the file to load the state of plugins from and store the state to.
  ## If uncommented and not empty, this file will be used to save the state of
  ## stateful plugins on termination of Telegraf. If the file exists on start,
  ## the state in the file will be restored for the plugins.
  # statefile = ""

[[outputs.prometheus_client]]
  ## Listen on tcp/9273
  listen = ":9273"

  ## Disable the go and process collectors by default
  collectors_exclude = ["gocollector", "process"]

[[inputs.execd]]
  command = ["/arista_cloudvision_telemetry", "-config", "/plugin.conf"]
  signal = "none"

plugin.conf:

[[inputs.arista_cloudvision_telemtry]]
  ## CVP Address
  addresses = "CVP:443"
  ## redial in case of failures after
  redial = "10s"
  enable_tls = true
  cvptoken = "CVPTOKEN"

[[inputs.arista_cloudvision_telemtry.subscription]]
  ## Name of the measurement
  name = "InterfaceCounters"
  origin = "openconfig"
  path = "/"
  subscription_mode = "target_defined"

Thanks!

burnyd commented 1 year ago

Thanks so I can duplicate this.

Using the file output for debuging [[outputs.file]] files = ["/dev/stdout"] File output

/interfaces/interface/state/counters/in_octets,host=DC1-L2LEAF1A,host-id=SN-DC1-L2LEAF1A,name=Management1

Checking prometheus the same way you are

_interfaces_interface_state_counters_in_octets__interfaces_interface_state_counters_in_octets{host="DC1-L2LEAF1A",host_id="SN-DC1-L2LEAF1A",name="Management1"}

It converts over the / to _ because of the way prometheus handles slashes within promql. Keep in mind that in this case the measurement and the field are the same its the path ie /interfaces/interface/state/counters/in_octets but I am not entirely sure why it is appended together within prometheus or why telegraf does this. So I will have to take a look at how prometheus does this or why it appends them together.

burnyd commented 1 year ago

Closing this issue as discussed with @dhunteratg that this is the outcome of the prometheus plugin acting as normal. As the metric and the fieldkey are the same the prometheus plugin appends them together with a _ in the middle of them as noted here

Also shown within the prometheus examples here