ccfos / nightingale

An all-in-one observability solution which aims to combine the advantages of Prometheus and Grafana. It manages alert rules and visualizes metrics, logs, traces in a beautiful web UI.
https://flashcat.cloud/docs/
Apache License 2.0
9.63k stars 1.4k forks source link

categraf使用snmp协议采集交换机,如果将字段agent_host_tag = "ident",夜莺报错 #1976

Closed TotalAnnihilation closed 1 week ago

TotalAnnihilation commented 3 months ago

Your config.toml

[Global]
RunMode = "release"

[Log]
# log write dir
Dir = "logs"
# log level: DEBUG INFO WARNING ERROR
Level = "DEBUG"
# stdout, stderr, file
Output = "file"
# # rotate by time
# KeepHours = 4
# # rotate by size
RotateNum = 3
# # unit: MB
RotateSize = 256

[HTTP]
# http listening address
Host = "0.0.0.0"
# http listening port
Port = 17000
# https cert file path
CertFile = ""
# https key file path
KeyFile = ""
# whether print access log
PrintAccessLog = false
# whether enable pprof
PProf = false
# expose prometheus /metrics?
ExposeMetrics = true
# http graceful shutdown timeout, unit: s
ShutdownTimeout = 30
# max content length: 64M
MaxContentLength = 67108864
# http server read timeout, unit: s
ReadTimeout = 20
# http server write timeout, unit: s
WriteTimeout = 40
# http server idle timeout, unit: s
IdleTimeout = 120

[HTTP.ShowCaptcha]
Enable = false 

[HTTP.APIForAgent]
Enable = true 
# [HTTP.APIForAgent.BasicAuth]
# user001 = "ccc26da7b9aba533cbb263a36c07dcc5"

[HTTP.APIForService]
Enable = true 
[HTTP.APIForService.BasicAuth]
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"

[HTTP.JWTAuth]
# signing key
SigningKey = "5b94a0fd640fe2765af826acfe42d151"
# unit: min
AccessExpired = 1500
# unit: min
RefreshExpired = 10080
RedisKeyPrefix = "/jwt/"

[HTTP.ProxyAuth]
# if proxy auth enabled, jwt auth is disabled
Enable = false
# username key in http proxy header
HeaderUserNameKey = "X-User-Name"
DefaultRoles = ["Standard"]

[HTTP.RSA]
# open RSA
OpenRSA = false

[DB]
# postgres: host=%s port=%s user=%s dbname=%s password=%s sslmode=%s
# postgres: DSN="host=127.0.0.1 port=5432 user=root dbname=n9e_v6 password=1234 sslmode=disable"
DSN="root:1234@tcp(127.0.0.1:3306)/n9e_v5?charset=utf8mb4&parseTime=True&loc=Local&allowNativePasswords=true"

# enable debug mode or not
Debug = false
# mysql postgres
DBType = "mysql"
# unit: s
MaxLifetime = 7200
# max open connections
MaxOpenConns = 150
# max idle connections
MaxIdleConns = 50
# table prefix
TablePrefix = ""
# enable auto migrate or not
# EnableAutoMigrate = false

[Redis]
# address, ip:port or ip1:port,ip2:port for cluster and sentinel(SentinelAddrs)
Address = "127.0.0.1:6379"
# Username = ""
# Password = ""
# DB = 0
# UseTLS = false
# TLSMinVersion = "1.2"
# standalone cluster sentinel
RedisType = "standalone"
# Mastername for sentinel type
# MasterName = "mymaster"
# SentinelUsername = ""
# SentinelPassword = ""

[Alert]
[Alert.Heartbeat]
# auto detect if blank
IP = ""
# unit ms
Interval = 1000
EngineName = "default"

# [Alert.Alerting]
# NotifyConcurrency = 10

[Center]
MetricsYamlFile = "./etc/metrics.yaml"
I18NHeaderKey = "X-Language"

[Center.AnonymousAccess]
PromQuerier = true
AlertDetail = true

[Pushgw]
# use target labels in database instead of in series
LabelRewrite = true
# # default busigroup key name
# BusiGroupLabelKey = "busigroup"
ForceUseServerTS = true

# [Pushgw.DebugSample]
# ident = "xx"
# __name__ = "xx"

# [Pushgw.WriterOpt]
# QueueMaxSize = 1000000
# QueuePopSize = 1000

[[Pushgw.Writers]] 
# Url = "http://127.0.0.1:8480/insert/0/prometheus/api/v1/write"
Url = "http://127.0.0.1:9090/api/v1/write"
# Basic auth username
BasicAuthUser = ""
# Basic auth password
BasicAuthPass = ""
# timeout settings, unit: ms
Headers = ["X-From", "n9e"]
Timeout = 10000
DialTimeout = 3000
TLSHandshakeTimeout = 30000
ExpectContinueTimeout = 1000
IdleConnTimeout = 90000
# time duration, unit: ms
KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 100
## Optional TLS Config
# UseTLS = false
# TLSCA = "/etc/n9e/ca.pem"
# TLSCert = "/etc/n9e/cert.pem"
# TLSKey = "/etc/n9e/key.pem"
# InsecureSkipVerify = false
# [[Writers.WriteRelabels]]
# Action = "replace"
# SourceLabels = ["__address__"]
# Regex = "([^:]+)(?::\\d+)?"
# Replacement = "$1:80"
# TargetLabel = "__address__"

Relevant logs

Jun  3 18:25:32 ecs-nightingale n9e: 2024-06-03 18:25:32.107689 WARNING writer/writer.go:80 post to http://127.0.0.1:9090/api/v1/write got error: push data with remote write:http://127.0.0.1:9090/api/v1/write request got status code: 500, response body: label name "ident" is not unique: invalid sample
Jun  3 18:25:32 ecs-nightingale n9e: 2024-06-03 18:25:32.107705 WARNING writer/writer.go:81 example timeseries:labels:<name:"__name__" value:"snmp_icmp_up" > labels:<name:"region" value:"Z3" > labels:<name:"infosys" value:"\345\237\272\347\241\200\350\256\276\346\226\275" > labels:<name:"product" value:"\345\215\232\347\247\221\345\205\211\347\272\244\344\272\244\346\215\242\346\234\272" > labels:<name:"ident" value:"10.1.8.202" > labels:<name:"ident" value:"n9e\346\234\215\345\212\241\345\231\250" > samples:<value:1 timestamp:1717410331000 >

System info

CentOS Linux release 7.9.2009 (Core)

Steps to reproduce

categraf使用snmp协议采集交换机,如果将字段agent_host_tag = "ident",夜莺报错。 注释掉这个字段则可以正常采集,但是ident是虚拟机的ip,非交换机的ip,难以用业务组进行管理

Expected behavior

Actual behavior

Additional info

No response

UlricQin commented 1 week ago

报错是因为 categraf 采集本来就带有 ident 标签来标识 categraf 所在的机器的标识,你可以修改 categraf 的config.toml,把忽略 hostname 的那个配置设置为 true。

另外,商业版本有专门的网络设备管理。开源版本的话,非常不建议交换机的agent_host_tag设置为 ident,机器列表本来就是给机器用的,不是给交换机用的,交换机还是建议使用标签做分类管理。

如果还有疑问可以到 github.com/flashcatcloud/categraf 提 issue 哈

TotalAnnihilation commented 2 days ago

报错是因为 categraf 采集本来就带有 ident 标签来标识 categraf 所在的机器的标识,你可以修改 categraf 的config.toml,把忽略 hostname 的那个配置设置为 true。

另外,商业版本有专门的网络设备管理。开源版本的话,非常不建议交换机的agent_host_tag设置为 ident,机器列表本来就是给机器用的,不是给交换机用的,交换机还是建议使用标签做分类管理。

如果还有疑问可以到 github.com/flashcatcloud/categraf 提 issue 哈

谢谢解答。不过如果使用标签的话无法区分业务组,交换机不是一个人在管,所以希望用业务组分发给不同的人。