apache / hertzbeat

Apache HertzBeat(incubating) is a real-time monitoring system with agentless, performance cluster, prometheus-compatible, custom monitoring and status page building capabilities.
https://hertzbeat.apache.org/
Apache License 2.0
5.45k stars 947 forks source link

[Task] The problem of alarm triggering and recovery time in the same alarm information #1405

Closed tengfei-wu closed 8 months ago

tengfei-wu commented 9 months ago

Question

我在使用HertzBeat中发现如下问题: 当前模板无论是触发告警还是恢复告警,都只是给出一个时间,这个有时候根本不知道哪个是哪个几点触发的。我希望同一个告警触发和恢复时间做个关联,这样当一个告警信息发出时,信息中会显示此告警触发的时间,当这个告警恢复时,信息中应显示此告警触发和恢复的时间,便于使用者更好区分不同时段的告警信息(如附件所显示的)。附件中是我使用prometheus时收到的告警恢复信息:

prometheus-alertinfo
hertzbeat commented 9 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Question

I found the following problems when using HertzBeat: Whether the current template triggers an alarm or restores an alarm, it only gives a time. Sometimes it is impossible to know which one was triggered at which time. I want to associate the triggering and recovery time of the same alarm, so that when an alarm message is sent, the message will display the time when the alarm was triggered. When the alarm is restored, the message should display the time when the alarm was triggered and restored, which is convenient. Users can better distinguish alarm information in different time periods (as shown in the attachment). Attached is the alarm recovery information I received when using prometheus:

prometheus-alertinfo
Calvin979 commented 9 months ago

I'd like to help!

hertzbeat commented 9 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I'd like to help!

Calvin979 commented 8 months ago

Hi, @tomsun28 我改了对应的告警恢复时间,效果如下图所示: image

对应的告警email: image

对应的告警恢复email: image

现在还有一个问题:是否有办法测试剩余的其他所有告警通知。我检查了下,告警通知很多——钉钉、华为云、飞书、企微、telegram等等……这些该如何测试?还是说用单元测试就可以了

hertzbeat commented 8 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Hi, @tomsun28 I changed the corresponding alarm recovery time, and the effect is as shown below: image

Corresponding alarm email: image

Corresponding alarm recovery email: image

Now there is a question: is there a way to test all other remaining alert notifications. I checked and found a lot of alarm notifications - DingTalk, Huawei Cloud, Feishu, Qiwei, telegram, etc... How should I test these? Or just use unit testing?

tomsun28 commented 8 months ago

@Calvin979 👍👍 hi 元旦快乐,不用每个都实际环境测试一遍,我们保证在代码debug那里看到的渲染内容OK就行,因为都是用的freemarker模版渲染的

hertzbeat commented 8 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@Calvin979 👍👍 hi Happy New Year, there is no need to test each one in the actual environment, we ensure that the rendering content seen in the code debug is OK, because they are all rendered using the freemarker template

Calvin979 commented 8 months ago

@tomsun28 元旦快乐!对应的PR是这个

1464

hertzbeat commented 8 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@tomsun28 Happy New Year! The corresponding PR is this

1464