Open JellyZero opened 1 year ago
希望夜莺支持在 界面配置 prometheus 的 record rule的功能,将计算规则在夜莺录入然后计算完成,在向 prometheus 写入计算后的规则,
类似这种规则:
--- groups: - name: sloth-slo-sli-recordings-myservice-requests-availability rules: - record: slo:sli_error:ratio_rate5m expr: | (sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[5m]))) / (sum(rate(http_request_duration_seconds_count{job="myservice"}[5m]))) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability sloth_window: 5m - record: slo:sli_error:ratio_rate30m expr: | (sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[30m]))) / (sum(rate(http_request_duration_seconds_count{job="myservice"}[30m]))) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability sloth_window: 30m - record: slo:sli_error:ratio_rate1h expr: | (sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[1h]))) / (sum(rate(http_request_duration_seconds_count{job="myservice"}[1h]))) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability sloth_window: 1h - record: slo:sli_error:ratio_rate2h expr: | (sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[2h]))) / (sum(rate(http_request_duration_seconds_count{job="myservice"}[2h]))) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability sloth_window: 2h - record: slo:sli_error:ratio_rate6h expr: | (sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[6h]))) / (sum(rate(http_request_duration_seconds_count{job="myservice"}[6h]))) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability sloth_window: 6h - record: slo:sli_error:ratio_rate1d expr: | (sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[1d]))) / (sum(rate(http_request_duration_seconds_count{job="myservice"}[1d]))) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability sloth_window: 1d - record: slo:sli_error:ratio_rate3d expr: | (sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[3d]))) / (sum(rate(http_request_duration_seconds_count{job="myservice"}[3d]))) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability sloth_window: 3d - record: slo:sli_error:ratio_rate30d expr: | sum_over_time(slo:sli_error:ratio_rate5m{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"}[30d]) / ignoring (sloth_window) count_over_time(slo:sli_error:ratio_rate5m{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"}[30d]) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability sloth_window: 30d - name: sloth-slo-meta-recordings-myservice-requests-availability rules: - record: slo:objective:ratio expr: vector(0.9990000000000001) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability - record: slo:error_budget:ratio expr: vector(1-0.9990000000000001) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability - record: slo:time_period:days expr: vector(30) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability - record: slo:current_burn_rate:ratio expr: | slo:sli_error:ratio_rate5m{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} / on(sloth_id, sloth_slo, sloth_service) group_left slo:error_budget:ratio{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability - record: slo:period_burn_rate:ratio expr: | slo:sli_error:ratio_rate30d{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} / on(sloth_id, sloth_slo, sloth_service) group_left slo:error_budget:ratio{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability - record: slo:period_error_budget_remaining:ratio expr: 1 - slo:period_burn_rate:ratio{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_service: myservice sloth_slo: requests-availability - record: sloth_slo_info expr: vector(1) labels: owner: myteam repo: myorg/myservice sloth_id: myservice-requests-availability sloth_mode: cli-gen-prom sloth_objective: "99.9" sloth_service: myservice sloth_slo: requests-availability sloth_spec: prometheus/v1 sloth_version: a9d9dc42fb66372fb1bd2c69ca354da4ace51b65 - name: sloth-slo-alerts-myservice-requests-availability rules: - alert: MyServiceHighErrorRate expr: | ( max(slo:sli_error:ratio_rate5m{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (14.4 * 0.0009999999999999432)) without (sloth_window) and max(slo:sli_error:ratio_rate1h{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (14.4 * 0.0009999999999999432)) without (sloth_window) ) or ( max(slo:sli_error:ratio_rate30m{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (6 * 0.0009999999999999432)) without (sloth_window) and max(slo:sli_error:ratio_rate6h{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (6 * 0.0009999999999999432)) without (sloth_window) ) labels: category: availability routing_key: myteam severity: pageteam sloth_severity: page annotations: summary: High error rate on 'myservice' requests responses title: (page) {{$labels.sloth_service}} {{$labels.sloth_slo}} SLO error budget burn rate is too fast. - alert: MyServiceHighErrorRate expr: | ( max(slo:sli_error:ratio_rate2h{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (3 * 0.0009999999999999432)) without (sloth_window) and max(slo:sli_error:ratio_rate1d{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (3 * 0.0009999999999999432)) without (sloth_window) ) or ( max(slo:sli_error:ratio_rate6h{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (1 * 0.0009999999999999432)) without (sloth_window) and max(slo:sli_error:ratio_rate3d{sloth_id="myservice-requests-availability", sloth_service="myservice", sloth_slo="requests-availability"} > (1 * 0.0009999999999999432)) without (sloth_window) ) labels: category: availability severity: slack slack_channel: '#alerts-myteam' sloth_severity: ticket annotations: summary: High error rate on 'myservice' requests responses title: (ticket) {{$labels.sloth_service}} {{$labels.sloth_slo}} SLO error budget burn rate is too fast.
这种是先后顺序周期计算的,互相依赖
告警管理 --> 记录规则 应该是你想要的效果吧?
不是,他这个只能添加一个记录,我看我的需求是添加多条记录,而且是先后顺序周期计算的,互相依赖
希望夜莺支持在 界面配置 prometheus 的 record rule的功能,将计算规则在夜莺录入然后计算完成,在向 prometheus 写入计算后的规则,
类似这种规则:
这种是先后顺序周期计算的,互相依赖