Closed dhunteratg closed 1 year ago
Can you provide a sample of your telegraf conf file and eos versions you are using as well as cvp version. Feel free to redact anything you would like. Just so I can give this a test on my end.
I see how its appending the labels together. Is it only for the _components_component_cpu_utilization_state_instant label?
@burnyd This is happening for all labels.
CVP: 2022.3.1 EOS: 4.28.6.1M
telegraf.conf:
# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply surround
# them with ${}. For strings the variable must be within quotes (ie, "${STR_VAR}"),
# for numbers and booleans they should be plain (ie, ${INT_VAR}, ${BOOL_VAR})
# Global tags can be specified here in key="value" format.
[global_tags]
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = "10s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 1000
## Maximum number of unwritten metrics per output. Increasing this value
## allows for longer periods of output downtime without dropping metrics at the
## cost of higher maximum memory usage.
metric_buffer_limit = 10000
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
collection_jitter = "0s"
## Collection offset is used to shift the collection by the given amount.
## This can be be used to avoid many plugins querying constraint devices
## at the same time by manually scheduling them in time.
# collection_offset = "0s"
## Default flushing interval for all outputs. Maximum flush_interval will be
## flush_interval + flush_jitter
flush_interval = "10s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = "0s"
## Collected metrics are rounded to the precision specified. Precision is
## specified as an interval with an integer + unit (e.g. 0s, 10ms, 2us, 4s).
## Valid time units are "ns", "us" (or "µs"), "ms", "s".
##
## By default or when set to "0s", precision will be set to the same
## timestamp order as the collection interval, with the maximum being 1s:
## ie, when interval = "10s", precision will be "1s"
## when interval = "250ms", precision will be "1ms"
##
## Precision will NOT be used for service inputs. It is up to each individual
## service input to set the timestamp at the appropriate precision.
precision = "0s"
## Log at debug level.
# debug = false
## Log only error level messages.
# quiet = false
## Log target controls the destination for logs and can be one of "file",
## "stderr" or, on Windows, "eventlog". When set to "file", the output file
## is determined by the "logfile" setting.
# logtarget = "file"
## Name of the file to be logged to when using the "file" logtarget. If set to
## the empty string then logs are written to stderr.
# logfile = ""
## The logfile will be rotated after the time interval specified. When set
## to 0 no time based rotation is performed. Logs are rotated only when
## written to, if there is no log activity rotation may be delayed.
# logfile_rotation_interval = "0h"
## The logfile will be rotated when it becomes larger than the specified
## size. When set to 0 no size based rotation is performed.
# logfile_rotation_max_size = "0MB"
## Maximum number of rotated archives to keep, any older logs are deleted.
## If set to -1, no archives are removed.
# logfile_rotation_max_archives = 5
## Pick a timezone to use when logging or type 'local' for local time.
## Example: America/Chicago
# log_with_timezone = ""
## Override default hostname, if empty use os.Hostname()
hostname = ""
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
## Method of translating SNMP objects. Can be "netsnmp" (deprecated) which
## translates by calling external programs snmptranslate and snmptable,
## or "gosmi" which translates using the built-in gosmi library.
# snmp_translator = "netsnmp"
## Name of the file to load the state of plugins from and store the state to.
## If uncommented and not empty, this file will be used to save the state of
## stateful plugins on termination of Telegraf. If the file exists on start,
## the state in the file will be restored for the plugins.
# statefile = ""
[[outputs.prometheus_client]]
## Listen on tcp/9273
listen = ":9273"
## Disable the go and process collectors by default
collectors_exclude = ["gocollector", "process"]
[[inputs.execd]]
command = ["/arista_cloudvision_telemetry", "-config", "/plugin.conf"]
signal = "none"
plugin.conf:
[[inputs.arista_cloudvision_telemtry]]
## CVP Address
addresses = "CVP:443"
## redial in case of failures after
redial = "10s"
enable_tls = true
cvptoken = "CVPTOKEN"
[[inputs.arista_cloudvision_telemtry.subscription]]
## Name of the measurement
name = "InterfaceCounters"
origin = "openconfig"
path = "/"
subscription_mode = "target_defined"
Thanks!
Thanks so I can duplicate this.
Using the file output for debuging
[[outputs.file]] files = ["/dev/stdout"]
File output
/interfaces/interface/state/counters/in_octets,host=DC1-L2LEAF1A,host-id=SN-DC1-L2LEAF1A,name=Management1
Checking prometheus the same way you are
_interfaces_interface_state_counters_in_octets__interfaces_interface_state_counters_in_octets{host="DC1-L2LEAF1A",host_id="SN-DC1-L2LEAF1A",name="Management1"}
It converts over the / to _ because of the way prometheus handles slashes within promql. Keep in mind that in this case the measurement and the field are the same its the path ie /interfaces/interface/state/counters/in_octets but I am not entirely sure why it is appended together within prometheus or why telegraf does this. So I will have to take a look at how prometheus does this or why it appends them together.
Having gotten this running with telegraf using the [[outputs.prometheus_client]] method to host a /metrics page, it looks like all of the metric names contain duplicated string:
Each of these measurements contain two copies of the string "_components_component_cpu_utilization_state_instant". Is there a way to eliminate the duplication within the measurement names?
Thanks