VictoriaMetrics / helm-charts

Helm charts for VictoriaMetrics, VictoriaLogs and ecosystem
https://victoriametrics.github.io/helm-charts/
Apache License 2.0
331 stars 326 forks source link

Issue with default k8s VMRules. #927

Closed hagaram closed 3 weeks ago

hagaram commented 6 months ago

Hi, no matter what I've tried and I've tried quit a lot of stuff, only thing left is uninstall and install of the whole chart. I can't seem to sucessfully install version of the chart 0.19.0 and greater. I've tried 0.19.0, 0.19.2 an 0.19.4 The latest version at least does not ignore k8s: false Kubernetes version: v1.28.6

defaultRules:
  create: true
  rules:
    etcd: true
    general: true
    k8s: false
    kubeApiserver: true
    kubeApiserverAvailability: true
    kubeApiserverBurnrate: true
    kubeApiserverHistogram: true
    kubeApiserverSlos: true
    kubelet: true
    kubePrometheusGeneral: true
    kubePrometheusNodeRecording: true
    kubernetesApps: true
    kubernetesResources: true
    kubernetesStorage: true
    kubernetesSystem: true
    kubeScheduler: true
    kubeStateMetrics: true
    network: true
    node: true
    vmagent: true
    vmsingle: true
    vmcluster: true
    vmhealth: true
    alertmanager: true
  1. But if the setting is set to true and chart is upgraded, it always ends up like this:
Error: UPGRADE FAILED: no VMRule with the name "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" found

BUT the rule gets created

  1. On the second run
    Error: UPGRADE FAILED: cannot patch "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" with kind VMRule: vmrules.operator.victoriametrics.com "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" with kind VMRule: vmrules.operator.victoriametrics.com "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" with kind VMRule: vmrules.operator.victoriametrics.com "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update.

I've never ever modified anything in this chart manually, not with kubectl edit or any other means.

Does anyone has any idea, what might me wrong?

Haleygo commented 6 months ago

Hello!

The latest version at least does not ignore k8s: false

Yes, we fixed that in https://github.com/VictoriaMetrics/helm-charts/pull/904 which released in v0.19.4.

I can't seem to sucessfully install version of the chart 0.19.0 and greater. I've tried 0.19.0, 0.19.2 an 0.19.4

From the error messages, there is vmrule remained. Have you installed older version before? Could you try to install after uninstall the previous release?

hagaram commented 5 months ago

Hi, uninstalling the release is the last resort. I don't really want to uninstall monitoring on production system. Yes, the k8s ruels were previously enabled, prior to 0.19 it worked without any issues.

It has the same behaviour on both testing and production clusters.

Haleygo commented 5 months ago

uninstalling the release is the last resort. I don't really want to uninstall monitoring on production system.

You can just delete this vmrule and upgrade the release. Could you provide some steps for me to reproduce? I tried to install victoria-metrics-k8s-stack-k8s-0.19.2 with k8s: false and upgrade to v0.19.4 with k8s: false, and there is no error.

hagaram commented 5 months ago

Thank you very much for replying! For me, it is enough to set k8s: true and error occurs exactly as described in initial post. If I delete the rule manually, I get error that there is no such rule upon upgrade (even while keeping the same chart version, only enabling the k8s rules)--> it gets created somehow anyway --> next upgrade failes with Invalid value: 0x0: must be specified for an update.

Only way I'm able to perform helm upgrade sucessfuly is disabling the default k8s rules with k8s: false. Doesnt mater if I perform version upgrade or keep the same chart version.

This rule seems to do the damage:

monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory

This is the debug output when aplying the upgrade

client.go:425: [debug] error updating the resource "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory":
     cannot patch "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" with kind VMRule: vmrules.operator.victoriametrics.com "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update
client.go:693: [debug] Patch VMRule "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" in namespace monitoring
client.go:425: [debug] error updating the resource "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory":
     cannot patch "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" with kind VMRule: vmrules.operator.victoriametrics.com "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update
client.go:693: [debug] Patch VMRule "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" in namespace monitoring
client.go:425: [debug] error updating the resource "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory":
     cannot patch "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" with kind VMRule: vmrules.operator.victoriametrics.com "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update
guidodobboletta commented 4 months ago

Still happening on 0.22.0

Wolfeg commented 3 months ago

0.23.2 monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory still bugged

❯ helm install monitoring vm/victoria-metrics-k8s-stack -n monitoring
Error: INSTALLATION FAILED: 3 errors occurred:
        * vmrules.operator.victoriametrics.com "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" already exists
        * vmrules.operator.victoriametrics.com "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" already exists
        * vmrules.operator.victoriametrics.com "monitoring-victoria-metrics-k8s-stack-k8s.rules.containermemory" already exists
Haleygo commented 2 months ago

Hi @Wolfeg @guidodobboletta , I believe the problem is the resources name, there are four vmrules with name xx_k8s.rules.containermemoryxx and their names could be truncated to 63 because of length limit, which caused conflicts. The solution here is to set .Values.fullnameOverride to override the default victoria-metrics-k8s-stack.fullname, for example, .Values.fullnameOverride: monitoring.

Haleygo commented 2 months ago

Also related to https://github.com/VictoriaMetrics/helm-charts/pull/1093.

kamenskiyyyy commented 2 months ago

Hi @Wolfeg @guidodobboletta , I believe the problem is the resources name, there are four vmrules with name xx_k8s.rules.containermemoryxx and their names could be truncated to 63 because of length limit, which caused conflicts. The solution here is to set .Values.fullnameOverride to override the default victoria-metrics-k8s-stack.fullname, for example, .Values.fullnameOverride: monitoring.

It helped me, thank you

AndrewChubatiuk commented 3 weeks ago

vm rules templates changed a lot since then. will close this issue. please open a new one if this issue is still relevant