VictoriaMetrics / helm-charts

Helm charts for VictoriaMetrics, VictoriaLogs and ecosystem
https://victoriametrics.github.io/helm-charts/
Apache License 2.0
332 stars 325 forks source link

Alert is shown even no restart happened for vm-agent #252

Closed eugenegoncharuk closed 1 month ago

eugenegoncharuk commented 2 years ago

I have a default rules setup for my cluster. This rule always sending me alerts TooManyRestarts with the definition below

  - name: vm-health
    rules:
    - alert: TooManyRestarts
      annotations:
        description: Job {{`{{`}} $labels.job {{`}}`}} has restarted more than twice in the last 15 minutes. It might be crashlooping.
        summary: '{{`{{`}} $labels.job {{`}}`}} too many restarts (instance {{`{{`}} $labels.instance {{`}}`}})'
      expr: changes(process_start_time_seconds{job=~"victoriametrics|vmagent|vmalert"}[15m]) > 2
      labels:
        severity: critical
{{- if .Values.defaultRules.additionalRuleLabels }}
{{ toYaml .Values.defaultRules.additionalRuleLabels | indent 8 }}
{{- end }}

If I check my "vm-agent" pods they haven't been restarted any time. Maybe it's worth to rewrite the rule for a different metric?

f41gh7 commented 2 years ago

Yes, it seems to be, that alert logic is incorrect for kubernetes workloads cc @hagen1778

hagen1778 commented 2 years ago

@f41gh7 why do you think the logic is incorrect?

@eugenegoncharuk could you please show query expression result (grafana screenshot maybe) in the time when alert triggered? Thanks!

7840vz commented 4 months ago

Same, we have false-positives as it seems: Metric:

image

Alert expression, same timeline:

image
AndrewChubatiuk commented 1 month ago

Looks like it's releated to issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767#issuecomment-1650932203 and it was fixed in v1.97.3