ccfos / nightingale

An all-in-one observability solution which aims to combine the advantages of Prometheus and Grafana. It manages alert rules and visualizes metrics, logs, traces in a beautiful web UI.
https://flashcat.cloud/docs/
Apache License 2.0
9.63k stars 1.4k forks source link

【使用帮助】nightingale:7.4.1远程写入 Prometheus 报错 duplicate sample for timestamp #2206

Closed zhuchubo closed 35 minutes ago

zhuchubo commented 2 hours ago

Question and Steps to reproduce

n9e-center 运行在 k8s 环境,Prometheus 使用开启 remote-write ,检查 n9e-center 发现一直有报错。我可以如何关闭这个 duplicate sample for timestamp

Relevant logs and configurations

2024-09-29 17:00:30.269047 WARNING writer/writer.go:81 example timeseries:labels:<name:"__name__" value:"container_memory_mapped_file" > labels:<name:"ident" value:"lm-sx-cpu-03-306" > labels:<name:"container" > labels:<name:"env" value:"datacenter" > labels:<name:"image" > labels:<name:"instance" value:"127.0.0.1:10250" > labels:<name:"namespace" > labels:<name:"pod" > labels:<name:"region" value:"xinzhou" > labels:<name:"source" value:"categraf" > samples:<timestamp:1727600429000 > 
2024-09-29 17:00:30.312049 WARNING writer/writer.go:130 push data with remote write:http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090/prometheus/api/v1/write request got status code: 400, response body: duplicate sample for timestamp
2024-09-29 17:00:30.312089 WARNING writer/writer.go:80 post to http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090/prometheus/api/v1/write got error: push data with remote write:http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090/prometheus/api/v1/write request got status code: 400, response body: duplicate sample for timestamp

Version

flashcatcloud/nightingale:7.4.1

710leo commented 2 hours ago

@zhuchubo 可以参考下这个 issue https://github.com/ccfos/nightingale/issues/857

zhuchubo commented 37 minutes ago

@zhuchubo 可以参考下这个 issue #857

我可能已经找到问题了,今晚验证一下。非常感谢!

zhuchubo commented 35 minutes ago

@zhuchubo 可以参考下这个 issue #857 categraf 是以 DaemonSets 方式部署。默认会捕获 kubelet 的 cadvisor metrics 值,因为 kubelet 有证书鉴权,无法获取。所以导致重复的传递初始数据。在禁用 cadvisor 这个 input 后不再报错。