Open Haleygo opened 4 months ago
The second option listed would be an amazing add. I could even forsee something like:
##The name of the group. Must be unique within a file.
name:
##replay it when start this group for the first time
replay: true
replayFrom: 30d
This would eliminate the need to specify and manage specific dates, especially if the rule group is updated semi-frequently. Thoughts?
The second option listed would be an amazing add. I could even forsee something like:
##The name of the group. Must be unique within a file. name: ##replay it when start this group for the first time replay: true replayFrom: 30d
This would eliminate the need to specify and manage specific dates, especially if the rule group is updated semi-frequently. Thoughts?
Of course, if we decide to implement the second option.
Is your feature request related to a problem? Please describe
vmalert supports alerting and recording rules backfilling (aka replay) as a cli-tool, and exits immediately after work is done. It provides a interface like this to display the execution progress.
We normally recommend to use jobs like in K8s to perform replay operations. But sometimes, user who manages rules doesn't have permission to create k8s resources or don't want to have extra code to manage those jobs.
Describe the solution you'd like
replay_rule_queue_number
).Adding replay options to rule group and rule, vmalert will try replaying recording rules when this group/rule starts(options only valid for recording rules, we check if this rule has been replayed before by querying datasource for replay successful metric
vmalert_replay_successed{group="", id=<rule parameter hash>, }
first).Q:
replay: true
param when group replay is done, this can be checked by vmalert logs or metricvmalert_replay_successed{group="", id=<rule parameter hash>, } value=FinishedTimestamp
. But it also ok to not remove this parameter immediately after replay is over. By default, we checkvmalert_replay_successed{group="", id=<rule parameter hash>, } value=FinishedTimestamp
for 30 days, and skip the rule replay if it's already successed in 30 days.The param is checked when group start(group start happens when vmalert starts or group been created/updated), we do extra queryabsent_over_time(recording_rule_name[30d])
to datasource to determine if this rule needs to be replayed this time.About extra resource for above proposals, we should have some default limits, including: