feiyu563 / PrometheusAlert

Prometheus Alert是开源的运维告警中心消息转发系统,支持主流的监控系统Prometheus,Zabbix,日志系统Graylog和数据可视化系统Grafana发出的预警消息,支持钉钉,微信,华为云短信,腾讯云短信,腾讯云电话,阿里云短信,阿里云电话等
https://feiyu563.gitbook.io
MIT License
2.79k stars 672 forks source link

同一个告警发送好几次 #288

Closed Leif160519 closed 1 year ago

Leif160519 commented 1 year ago

同一个告警,飞书收到好几遍 image 查看prometheusalert的日志也是打印很多次 image 经常遇到,一直没能解决,希望能给点思路

Leif160519 commented 1 year ago

有的时候告警发一遍,大部分的时候发两遍

feiyu563 commented 1 year ago

通常存在几种可能 1.alertmanager路由重复发送 2.本身alertmanager发来的聚合消息中就包含了多条重复告警

Leif160519 commented 1 year ago

目前是这样的情况: 1.alertmanager里显示的告警有一条 2.飞书发出的重复告警时间间隔15秒钟 3.监控metrics用了victoriametrics做了持久化

有几个疑问和猜测: 1.prometheusalert配置中配置了默认飞书机器人地址fsurl=xxx的话,是否每个告警都会发送到xxx 2.prometheus和victoriametrics会不会同时向alertmanager发送告警信息

贴上配置

/etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 1m

  external_labels:
    region: Tencent

remote_write:
  - url: http://10.200.0.188:8428/api/v1/write

rule_files:
  - /etc/prometheus/rules/*.rules

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 127.0.0.1:9093
    ......
/etc/prometheus/alertmanager.yml

global:
  resolve_timeout: 5m

templates:
  - /etc/prometheus/conf.d/email.tmpl

inhibit_rules:
  - source_match_re: # 严重抑制警告
      severity: critical
    target_match_re:
      severity: warning
    equal: [ all, alertname ]

route:
  group_by: ['alertname','job']
  group_wait: 3m
  group_interval: 5m
  repeat_interval: 24h
  receiver: 'xxx'
  routes:
  - receiver: "@linux/feishu"
    match_re: { channels: "(.*)?@linux/feishu([:/;].*)?" }
    continue: true

receivers:
  - name: 'xxx'
  - name: '@linux/feishu'  # feishu非P0告警
    webhook_configs:
      - url: 'http://127.0.0.1:8080/prometheusalert?type=fs&tpl=prometheus-fs&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/201a64e9-770d-428c-xxx-xxxxxxx'
        send_resolved: true
...
Leif160519 commented 1 year ago

这几天貌似正常了,先关闭了