Telefonica / prometheus-kafka-adapter

Use Kafka as a remote storage database for Prometheus (remote write only)
Apache License 2.0
364 stars 135 forks source link

"error":"snappy: corrupt input","level":"error","msg":"couldn't decompress body" #40

Closed SAbdulRahuman closed 4 years ago

SAbdulRahuman commented 4 years ago

I have following server in my docker-compose.yml file

` prometheus-kafka-adapter: image: telefonica/prometheus-kafka-adapter:1.6.0 ports:

Error logs:

prometheus-kafka-adapter_1 | {"error":"snappy: corrupt input","level":"error","msg":"couldn't decompress body","time":"2020-05-01T17:21:02Z"} prometheus-kafka-adapter_1 | {"fields.time":"2020-05-01T17:21:02Z","ip":"172.24.0.1","latency":278023,"level":"info","method":"POST","msg":"","path":"/receive","status":400,"time":"2020-05-01T17:21:02Z","user-agent":"Alertmanager/0.20.0"}

palmerabollo commented 4 years ago

Hi @SAbdulRahuman. Thanks for reporting the issue.

I don't know very well how compression works in Kafka brokers & producers, and I'm not using "snappy" compression, so can't help a lot here. The only relevant code is this line where the KAFKA_COMPRESSION value is set as "compression.codec" in the kafka.ConfigMap.

Let's see if somebody else can shed some light on it.

jpfe-tid commented 4 years ago

Hi,

Snappy compression is used as part of the Prometheus remote read and write protocol to compress Prometheus protocol buffer encoding over HTTP. More details on the link below:

https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations

The error attached belongs to the snappy decompression step within the receive handler.

https://github.com/Telefonica/prometheus-kafka-adapter/blob/d430b12d27810ecee67d5eaf6f87f6693c1eac60/handlers.go#L43-L48

I can see in the log following the error that the request's user agent header is set to Alertmanager. Alertmanager is not compatible with prometheus-kafka-adapter.

Can you share your Prometheus configuration?

SAbdulRahuman commented 4 years ago

Hi @palmerabollo and @jpfe-tid,

I was trying to send "alerts" from prometheus to Kafka. the approach is like "Prometheus to AlertManager to prometheu-kafka-adapter to kafka". There is no need of compre kossion,first KAFKA_COMPRESSION value was default 'none' but i got the same issue, So tried to set 'snappy' as handlers.go performing 'snappy.Decode'.

Please find all configurations

prometheus.yml


global:
    scrape_interval: 15s
    evaluation_interval: 15s
    external_labels:
        monitor: 'my-project'

alerting:
    alertmanagers:
        - static_configs:
              - targets:
                    - 192.168.174.129:9093
          scheme: http
          timeout: 10s

scrape_configs:
    - job_name: 'prometheus'
      scrape_interval: 10s
      scrape_timeout: 10s
      metrics_path: /metrics
      static_configs:
          - targets: ['prometheus:9090']

    - job_name: 'node-exporter'
      scrape_interval: 10s
      scrape_timeout: 10s
      metrics_path: /metrics
      static_configs:
          - targets: ['node-exporter:9100']

rule_files:
    - rules/rules.yml
    - rules/alerts*.yml

alertmanager.yml

route:
    group_by: ["alertname", "team"]
    group_wait: 1m
    group_interval: 5m
    repeat_interval: 4h
    receiver: email-logs
    routes:
        - match_re:
              team: (raptors|leafs)
          receiver: email-logs
          continue: True
          routes:
              - match:
                    severity: error
                receiver: email-logs

inhibit_rules:
    - source_match:
          severity: "urgent"
      target_match:
          severity: "warn"
      equal: ["alertname", "instance"]

receivers:
    - name: email-logs
      webhook_configs:
          - url: "http://192.168.174.129:8080/receive"

docker-compose.yml


version: "3.1"

volumes:
    prometheus_data: {}

networks:
    back-tier:

services:
    prometheus:
        image: prom/prometheus:v2.1.0
        container_name: prometheus
        volumes:
            - .:/etc/prometheus/
            - prometheus_data:/prometheus
        command:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus"
            - "--web.console.libraries=/usr/share/prometheus/console_libraries"
            - "--web.console.templates=/usr/share/prometheus/consoles"
            - "--web.enable-lifecycle"
        ports:
            - 9090:9090
        networks:
            - back-tier
        restart: always

    node-exporter:
        image: prom/node-exporter
        volumes:
            - /proc:/host/proc:ro
            - /sys:/host/sys:ro
            - /:/rootfs:ro
        command:
            - "--path.procfs=/host/proc"
            - "--path.sysfs=/host/sys"
            - --collector.filesystem.ignored-mount-points
            - "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
        ports:
            - 9100:9100
        networks:
            - back-tier
        restart: always

    alertmanager:
        image: prom/alertmanager
        container_name: alertmanager
        volumes:
            - ./alertmanager:/etc/alertmanager
        command:
            - "--config.file=/etc/alertmanager/alertmanager.yml"
            - "--storage.path=/alertmanager"
        restart: always
        ports:
            - "9093:9093"

    prometheus-kafka-adapter:
        image: telefonica/prometheus-kafka-adapter:1.6.0
        ports:
            - 8080:8080
        networks:
            - back-tier
        restart: always
        environment:
            - KAFKA_BROKER_LIST=192.168.174.129:9092
            - KAFKA_TOPIC=first_topic
            - LOG_LEVEL=debug
            - KAFKA_COMPRESSION=snappy

Removed " - KAFKA_COMPRESSION=snappy" as default will be none, but still facing the same issue.

rules.yml


groups:
    - name: rules-demo
      rules:
          - record: job:node_cpu_seconds:usage
            expr: sum without(cpu,mode)(irate(node_cpu_seconds_total{mode!="idle"}[5m])) / count without(cpu)(count without(mode)(node_cpu_seconds_total)) *100
          - alert: CPUUsageAbove20%
            expr: 60 > job:node_cpu_seconds:usage > 20
            #for: 1m
            labels:
                severity: warn
                team: raptors
            annotations:
                description: 'CPU usage on {{ $labels.instance }} has reached {{ $value }}'
                dashboard: 'www.prometheus.io'
          - alert: CPUUsageAbove60%
            expr: job:node_cpu_seconds:usage > 60
            #for: 1m
            labels:
                severity: urgent
                team: raptors, leaf
            annotations:
                description: 'CPU usage has reached 60%'
palmerabollo commented 4 years ago

@jpfe-tid is right. That flow (alertmanager -> prometheus-kafka-adapter) is not supported. Only prometheus -> prometheus-kafka-adapter using remote_write is allowed.

I'm closing this issue because the flow you describe (receiving alerts via alertmanager's webhook_configs) is out of the scope of this project. It sounds like an interesting project in itself, though.