categraf使用snmp协议采集交换机,如果将字段agent_host_tag = "ident",夜莺报错 #1976

Closed TotalAnnihilation closed 1 week ago

TotalAnnihilation commented 3 months ago

Your config.toml

RunMode = "release"

# log write dir
Dir = "logs"
Level = "DEBUG"
# stdout, stderr, file
Output = "file"
# # rotate by time
# KeepHours = 4
# # rotate by size
RotateNum = 3
# # unit: MB
RotateSize = 256

# http listening address
Host = ""
# http listening port
Port = 17000
# https cert file path
CertFile = ""
# https key file path
KeyFile = ""
# whether print access log
PrintAccessLog = false
# whether enable pprof
PProf = false
# expose prometheus /metrics?
ExposeMetrics = true
# http graceful shutdown timeout, unit: s
ShutdownTimeout = 30
# max content length: 64M
MaxContentLength = 67108864
# http server read timeout, unit: s
ReadTimeout = 20
# http server write timeout, unit: s
WriteTimeout = 40
# http server idle timeout, unit: s
IdleTimeout = 120

Enable = false 

Enable = true 
# [HTTP.APIForAgent.BasicAuth]
# user001 = "ccc26da7b9aba533cbb263a36c07dcc5"

Enable = true 
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"

# signing key
SigningKey = "5b94a0fd640fe2765af826acfe42d151"
# unit: min
AccessExpired = 1500
# unit: min
RefreshExpired = 10080
RedisKeyPrefix = "/jwt/"

# if proxy auth enabled, jwt auth is disabled
Enable = false
# username key in http proxy header
HeaderUserNameKey = "X-User-Name"
DefaultRoles = ["Standard"]

# open RSA
OpenRSA = false

# postgres: host=%s port=%s user=%s dbname=%s password=%s sslmode=%s
# postgres: DSN="host= port=5432 user=root dbname=n9e_v6 password=1234 sslmode=disable"

# enable debug mode or not
Debug = false
# mysql postgres
DBType = "mysql"
# unit: s
MaxLifetime = 7200
# max open connections
MaxOpenConns = 150
# max idle connections
MaxIdleConns = 50
# table prefix
TablePrefix = ""
# enable auto migrate or not
# EnableAutoMigrate = false

# address, ip:port or ip1:port,ip2:port for cluster and sentinel(SentinelAddrs)
Address = ""
# Username = ""
# Password = ""
# DB = 0
# UseTLS = false
# TLSMinVersion = "1.2"
# standalone cluster sentinel
RedisType = "standalone"
# Mastername for sentinel type
# MasterName = "mymaster"
# SentinelUsername = ""
# SentinelPassword = ""

# auto detect if blank
IP = ""
# unit ms
Interval = 1000
EngineName = "default"

# [Alert.Alerting]
# NotifyConcurrency = 10

MetricsYamlFile = "./etc/metrics.yaml"
I18NHeaderKey = "X-Language"

PromQuerier = true
AlertDetail = true

# use target labels in database instead of in series
LabelRewrite = true
# # default busigroup key name
# BusiGroupLabelKey = "busigroup"
ForceUseServerTS = true

# [Pushgw.DebugSample]
# ident = "xx"
# __name__ = "xx"

# [Pushgw.WriterOpt]
# QueueMaxSize = 1000000
# QueuePopSize = 1000

# Url = ""
Url = ""
# Basic auth username
BasicAuthUser = ""
# Basic auth password
BasicAuthPass = ""
# timeout settings, unit: ms
Headers = ["X-From", "n9e"]
Timeout = 10000
DialTimeout = 3000
TLSHandshakeTimeout = 30000
ExpectContinueTimeout = 1000
IdleConnTimeout = 90000
# time duration, unit: ms
KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 100
## Optional TLS Config
# UseTLS = false
# TLSCA = "/etc/n9e/ca.pem"
# TLSCert = "/etc/n9e/cert.pem"
# TLSKey = "/etc/n9e/key.pem"
# InsecureSkipVerify = false
# [[Writers.WriteRelabels]]
# Action = "replace"
# SourceLabels = ["__address__"]
# Regex = "([^:]+)(?::\\d+)?"
# Replacement = "$1:80"
# TargetLabel = "__address__"

Relevant logs

Jun  3 18:25:32 ecs-nightingale n9e: 2024-06-03 18:25:32.107689 WARNING writer/writer.go:80 post to got error: push data with remote write: request got status code: 500, response body: label name "ident" is not unique: invalid sample
Jun  3 18:25:32 ecs-nightingale n9e: 2024-06-03 18:25:32.107705 WARNING writer/writer.go:81 example timeseries:labels:<name:"__name__" value:"snmp_icmp_up" > labels:<name:"region" value:"Z3" > labels:<name:"infosys" value:"\345\237\272\347\241\200\350\256\276\346\226\275" > labels:<name:"product" value:"\345\215\232\347\247\221\345\205\211\347\272\244\344\272\244\346\215\242\346\234\272" > labels:<name:"ident" value:"" > labels:<name:"ident" value:"n9e\346\234\215\345\212\241\345\231\250" > samples:<value:1 timestamp:1717410331000 >

System info

CentOS Linux release 7.9.2009 (Core)

Steps to reproduce

categraf使用snmp协议采集交换机,如果将字段agent_host_tag = "ident",夜莺报错。 注释掉这个字段则可以正常采集,但是ident是虚拟机的ip,非交换机的ip,难以用业务组进行管理

Expected behavior

Actual behavior

Additional info

UlricQin commented 1 week ago

报错是因为 categraf 采集本来就带有 ident 标签来标识 categraf 所在的机器的标识,你可以修改 categraf 的config.toml,把忽略 hostname 的那个配置设置为 true。

另外,商业版本有专门的网络设备管理。开源版本的话,非常不建议交换机的agent_host_tag设置为 ident,机器列表本来就是给机器用的,不是给交换机用的,交换机还是建议使用标签做分类管理。

如果还有疑问可以到 提 issue 哈

TotalAnnihilation commented 2 days ago

报错是因为 categraf 采集本来就带有 ident 标签来标识 categraf 所在的机器的标识,你可以修改 categraf 的config.toml,把忽略 hostname 的那个配置设置为 true。

另外,商业版本有专门的网络设备管理。开源版本的话,非常不建议交换机的agent_host_tag设置为 ident,机器列表本来就是给机器用的,不是给交换机用的,交换机还是建议使用标签做分类管理。

如果还有疑问可以到 提 issue 哈

