fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.77k stars 1.57k forks source link

PrometheusDuplicateTimestamps errors with log_to_metrics filter starting in fluent-bit 3.1.5 #9413

Open reneeckstein opened 6 days ago

reneeckstein commented 6 days ago

Bug Report

Describe the bug After upgrading from fluent-bit 3.1.4 to 3.1.5 all our k8s clusters start reporting PrometheusDuplicateTimestamps errors Prometheus metric rate(prometheus_target_scrapes_sample_duplicate_timestamp_total}[5m]) > 0 is increasing. Prometheus is logging a lot of warnings like this:

ts=2024-09-23T16:22:32.820Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/platform-logging/fluent-bit/1 target=http://10.67.3.197:2021/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=32
ts=2024-09-23T16:22:38.237Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/platform-logging/fluent-bit/1 target=http://10.67.4.81:2021/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1876
ts=2024-09-23T16:22:39.697Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/platform-logging/fluent-bit/1 target=http://10.67.13.208:2021/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=4
ts=2024-09-23T16:22:41.643Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/platform-logging/fluent-bit/1 target=http://10.67.3.110:2021/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7

To Reproduce

Expected behavior No duplicate metrics on the additional endpoint /metrics for log_to_metrics feature usually on port 2021, no warnings in Prometheus logs, no PrometheusDuplicateTimestamps errors.

Screenshots image

Your Environment

extraPorts:

config: service: | [SERVICE] Flush 1 Daemon Off Log_Level info Parsers_File parsers.conf Parsers_File custom_parsers.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port {{ .Values.service.port }}

inputs: | [INPUT] Name tail Tag kube. Alias tail_container_logs Path /var/log/containers/.log multiline.parser docker, cri DB /var/log/flb_kube.db DB.locking true Mem_Buf_Limit 32MB Skip_Long_Lines On

filters: | [FILTER] Name kubernetes Alias kubernetes_all Match kube.* Merge_Log On Keep_Log Off K8S-Logging.Parser On K8S-Logging.Exclude On Annotations Off Buffer_Size 1MB Use_Kubelet true

[FILTER]
    name               log_to_metrics
    match              kube.*
    tag                log_counter_metric
    metric_mode        counter
    metric_name        kubernetes_messages
    metric_description This metric counts Kubernetes messages
    kubernetes_mode    true

outputs: | [OUTPUT] name prometheus_exporter match log_counter_metric host 0.0.0.0 port 2021


* Environment name and version (e.g. Kubernetes? What version?):
  * EKS; Kubernetes 1.30
* Server type and version:
* Operating System and version:
  * EKS on Bottlerocket OS 1.22.0 (aws-k8s-1.30)   Kernel version 6.1.106  containerd://1.7.20+bottlerocket
* Filters and plugins:
  * kubernetes, log_to_metrics

**Additional context**
It is just very annoying when every k8s cluster with this common configuration reports PrometheusDuplicateTimestamps errors
edsiper commented 2 days ago

@reneeckstein are you facing the same issue with v3.1.8 ? (we have some fixes in place for a similar problem)

reneeckstein commented 2 days ago

@edsiper Yes we are facing the same issue in fluent-bit v3.1.8. I'm looking forward for v3.1.9 I noticed to metrics-related commits on master branch.