feiyu563 / PrometheusAlert

Prometheus Alert是开源的运维告警中心消息转发系统,支持主流的监控系统Prometheus,Zabbix,日志系统Graylog和数据可视化系统Grafana发出的预警消息,支持钉钉,微信,华为云短信,腾讯云短信,腾讯云电话,阿里云短信,阿里云电话等
https://feiyu563.gitbook.io
MIT License
2.87k stars 681 forks source link

[dingdingSign] 配置文件已开启钉钉加签,钉钉机器人地址解析加签参数 secret 为空,将使用不加签的地址! #363

Open MagicStarTrace opened 11 months ago

MagicStarTrace commented 11 months ago

在alertmanager 配置了 告警分组[不同主机分组 @不同的联系人] webhook_configs":

prometheus-alert配置:

---------------------↓webhook-----------------------

是否开启钉钉告警通道,可同时开始多个通道0为关闭,1为开启

open-dingding=1 open-dingding-secret=1

默认钉钉机器人地址

ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxx

现在 prometheus-alert 控制台报错:

2023/12/25 10:11:17.590 [I] [prometheusalert.go:456]  [1703470277467171652] [dingding] {"errcode":310000,"errmsg":"description:机器人发送签名不匹配;solution:请确认签名和生成签名的时间戳必须都放在调用的网址中,请确认机器人的密钥加密和填写正确;link:请参考本接口对应文档获得具体要求,或者在https://open.dingtalk.com/document/  搜索相关文档;"}
2023/12/25 10:11:17.590 [D] [server.go:2936]  |     10.0.0.139| 200 | 123.289154ms|   match| POST     /prometheusalert   r:/prometheusalert
2023/12/25 10:16:17.468 [D] [value.go:586]  [1703470577468584110] {"receiver":"IT","status":"firing","alerts":[{"status":"firing","labels":{"alertgroup":"Node rules","alertname":"服务器内存利用率超过90%","app_name":"ks-alert","host":"192.168.110.236","instance":"192.168.110.236","severity":"critical"},"annotations":{"description":"Memory usage is more than 90%\n  VALUE = 85.8005143174\n  LABELS = map[host:192.168.110.236]","message":"服务器:192.168.110.236 内存使用率 \u003e 90% ,当前值:85.80% 。"},"startsAt":"2023-12-22T10:14:28.088890684Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://vmalert-vmalert-10-58c64b78cc-7ffg5:8080/vmalert/api/v1/alert?group_id=16095307156649292636\u0026alert_id=12075814400321301066","fingerprint":"bf5ff2faf81b564c"}],"groupLabels":{"instance":"192.168.110.236"},"commonLabels":{"alertgroup":"Node rules","alertname":"服务器内存利用率超过90%","app_name":"ks-alert","host":"192.168.110.236","instance":"192.168.110.236","severity":"critical"},"commonAnnotations":{"description":"Memory usage is more than 90%\n  VALUE = 85.8005143174\n  LABELS = map[host:192.168.110.236]","message":"服务器:192.168.110.236 内存使用率 \u003e 90% ,当前值:85.80% 。"},"externalURL":"http://vmalertmanager-alertmanager-0:9093","version":"4","groupKey":"{}/{instance=~\"^(?:192.168.*.*)$\"}:{instance=\"192.168.110.236\"}","truncatedAlerts":0}
2023/12/25 10:16:17.469 [I] [dingding.go:42]  [dingdingSign] 配置文件已开启钉钉加签,钉钉机器人地址解析加签参数 secret 为空,将使用不加签的地址!

不使用加签是可以正常发出的,现在报这个怎么办?谢谢~

Zhang21 commented 10 months ago

仅自定义模板有问题,默认模板正常。

Zhang21 commented 10 months ago

我测试了一下,自定义模板传过来的 ddurl 上没有 secret 参数,所以程序判断为空了。

看下自定义钉钉模板传过来的 secret 参数为什么消失了?

Zhang21 commented 10 months ago

原因是这里: https://github.com/feiyu563/PrometheusAlert/blob/a1d5d6ac17d9bdb4b0b5a3f117104f703d0baedf/controllers/prometheusalert.go#L126C3-L126C3

image

ddurl=xxx&secret=xxx 在 beego 的 input.Get() 是两个参数了,所以并不能取到 &secret=xxx 的值。

Zhang21 commented 10 months ago

临时解决方法有,这两个方法对传递多个钉钉地址、有加签和不加签名都不会有影响。

1,使用告警组来配置地址,在参数上使用 alertgroup=告警组 这种方式 2,或者,将 & 符号替换为 %26 编码,如 ddurl=xxx%26secret=xxx 让程序判断它们是一起的来处理它。

# 1 使用方法
http://xxx:8080/prometheusalert?type=dd&tpl=prometheus-dd&alertgroup=告警组&at=xxx

# 2 使用方法
http://xxx:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=xxx%26secret=xxx
Zhang21 commented 10 months ago

所以在自定义模板的参数中,需要对 ddurl 和 secret 分别取值,然后组装成一个字符串,再将值给 ddurl。

ddurl := beego.Input().Get("ddurl")
secret := beego.Input().Get("secret")
if len(secret) != 0 {
    ddurl = ddurl + "&secret=" secret
}
Zhang21 commented 10 months ago

但是这样又有一个问题,在自定义模板 URL 参数的 ddurl 里面有多个地址,有的加签,有的不加签名。上面的处理又会有问题。

上面的示例只适合单个地址,处理多个地址还是有问题,因此暂时不提修改代码,仅更新文档。

Zhang21 commented 10 months ago

建议先使用上面的临时方法。

Zhang21 commented 10 months ago

@feiyu563 看作者有啥好的想法没有。

MagicStarTrace commented 10 months ago

临时解决方法有,这两个方法对传递多个钉钉地址、有加签和不加签名都不会有影响。

1,使用告警组来配置地址,在参数上使用 alertgroup=告警组 这种方式 2,或者,将 & 符号替换为 %26 编码,如 ddurl=xxx%26secret=xxx 让程序判断它们是一起的来处理它。

# 1 使用方法
http://xxx:8080/prometheusalert?type=dd&tpl=prometheus-dd&alertgroup=告警组&at=xxx

# 2 使用方法
http://xxx:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=xxx%26secret=xxx

webhook_configs“:

目前这样可以了,不过这操作有点骚 太鬼畜了

同时,谢谢你!

dellnoantechnp commented 9 months ago

在 url 中指定 alertgroup 发送不成功:

curl -v 'http://10.103.41.159:8080/prometheusalert?type=dd&tpl=prometheus-dd&alertgroup=ag-alert' -H 'Content-type: application/json' -X POST -d '{"receiver":"sms","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"ProbeFailure","instance":"https://server.example.org","job":"http_checks","monitor":"master","severity":"critical"},"annotations":{"description":"Instance https://server.example.org has been down for over 5m. Job: http_checks","summary":"BlackBox Probe Failure: https://server.example.org"},"startsAt":"2023-02-06T13:08:45.828Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus.example.org:9090/graph?g0.expr=probe_success+%3D%3D+0\\u0026g0.tab=1ArgoCD","fingerprint":"1a30ba71cca2921f"}],"groupLabels":{"alertname":"ProbeFailure"},"commonLabels":{"alertname":"ProbeFailure","instance":"https://server.example.org","job":"http_checks","monitor":"master","severity":"critical"},"commonAnnotations":{"description":"Instance https://server.example.org has been down for over 5m. Job: http_checks","summary":"BlackBox Probe Failure: https://server.example.org"},"externalURL":"http://prometheus.example.org:9093","version":"4","groupKey":"{}/{severity=\"critical\"}:{alertname=\"ProbeFailure\"}","truncatedAlerts":0}'

返回:

{\"errcode\":310000,\"errmsg\":\"description:机器人发送签名不匹配;solution:请确认签名和生成签名的时间戳必须都放在调用的网址中,请确认机器人的密钥加密和填写正确

使用第二种方式,替换 &%26 字符:

curl -v 'http://10.103.41.159:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=9e67xxxxxx%26secret=SECxxxxxxxxx' -H 'Content-type: application/json' -X POST -d '{"receiver":"sms","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"ProbeFailure","instance":"https://server.example.org","job":"http_checks","monitor":"master","severity":"critical"},"annotations":{"description":"Instance https://server.example.org has been down for over 5m. Job: http_checks","summary":"BlackBox Probe Failure: https://server.example.org"},"startsAt":"2024-02-22T13:48:45.828Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus.example.org:9090/graph?g0.expr=probe_success+%3D%3D+0\\u0026g0.tab=1ArgoCD","fingerprint":"1a30ba71cca2921f"}],"groupLabels":{"alertname":"ProbeFailure"},"commonLabels":{"alertname":"ProbeFailure","instance":"https://server.example.org","job":"http_checks","monitor":"master","severity":"critical"},"commonAnnotations":{"description":"Instance https://server.example.org has been down for over 5m. Job: http_checks","summary":"BlackBox Probe Failure: https://server.example.org"},"externalURL":"http://prometheus.example.org:9093","version":"4","groupKey":"{}/{severity=\"critical\"}:{alertname=\"ProbeFailure\"}","truncatedAlerts":0}'

返回:

{\"errcode\":310000,\"errmsg\":\"description:机器人发送签名不匹配;solution:请确认签名和生成签名的时间戳必须都放在调用的网址中,请确认机器人的密钥加密和填写正确;link:请参考本接口对应文档获得具体要求 .....
Zhang21 commented 8 months ago

@dellnoantechnp 是最新的分支代码吗?

dellnoantechnp commented 8 months ago

@dellnoantechnp 是最新的分支代码吗?

用新版本容器镜像 feiyu563/prometheus-alert:master 测试通过。🎉🎉

这个版本完全可以打一个 v4.9.1 的镜像 tag,如果按照 Github 上 Release 的4.9版本去 dockerhub 找容器,不管tag 是 v4.9 还是 latest 都是有问题的版本,容易造成误导。


....
            [ag-demo]
            wxurl=wxurl1,wxurl2
            ddurl=ddurl1,ddurl1,
            fsurl=fsurl1
            email=email1,
            phone=phone1,phone2
            groupid=groupid1

            [ag-alert]
            ddurl=https://open.dingtalk.com/robot/send?access_token=***&secret=***,https://open.dingtalk.com/robot/send?access_token
.....

ps: 如果使用 alertgroup 告警组方式进行接口调用,配置文件如上,那么在接口调用这里应该使用 ag-alert 组名称。 split=false 为是否需要合并发送告警,默认为true,即单个alertmanager推送过来的一组告警会拆分为多条机器人通知消息发送到群。 例如 curl -v 'http://xxxx:8080/prometheusalert?type=dd&tpl=prometheus-dd&alertgroup=ag-alert&split=false'