VictoriaMetrics / helm-charts

Helm charts for VictoriaMetrics, VictoriaLogs and ecosystem
https://victoriametrics.github.io/helm-charts/
Apache License 2.0
331 stars 326 forks source link

[victoria-metrics-k8s-stack] Helm upgrade to 0.25.16 fails #1508

Closed notz closed 3 days ago

notz commented 3 days ago

Seems that the webhook operator port was changed in operator chart version 0.34.8 to 9443 from 443 and that the upgrade still makes requests to port 443.

Helm upgrade failed for release monitoring/vm with chart victoria-metrics-k8s-stack@0.25.16: failed to create resource: Internal error occurred: failed calling webhook "vmrule.victoriametrics.com": failed to call webhook: Post "https://vm-victoria-metrics-operator.monitoring.svc:443/validate-operator-victoriametrics-com-v1beta1-vmrule?timeout=10s": no service port 443 found for service "vm-victoria-metrics-operator" Last Helm logs: 2024-09-17T10:31:00.217044831Z: Patch VMNodeScrape "vm-victoria-metrics-k8s-stack-cadvisor" in namespace monitoring 2024-09-17T10:31:00.268678247Z: Patch VMNodeScrape "vm-victoria-metrics-k8s-stack-kubelet" in namespace monitoring 2024-09-17T10:31:00.324614279Z: Patch VMNodeScrape "vm-victoria-metrics-k8s-stack-probes" in namespace monitoring 2024-09-17T10:31:00.392711819Z: Patch VMRule "vm-victoria-metrics-k8s-stack-alertmanager.rules" in namespace monitoring 2024-09-17T10:31:00.420327221Z: error updating the resource "vm-victoria-metrics-k8s-stack-alertmanager.rules": cannot patch "vm-victoria-metrics-k8s-stack-alertmanager.rules" with kind VMRule: Internal error occurred: failed calling webhook "vmrule.victoriametrics.com": failed to call webhook: Post "https://vm-victoria-metrics-operator.monitoring.svc:443/validate-operator-victoriametrics-com-v1beta1-vmrule?timeout=10s": no service port 443 found for service "vm-victoria-metrics-operator" 2024-09-17T10:31:00.458372807Z: Patch VMRule "vm-victoria-metrics-k8s-stack-etcd" in namespace monitoring 2024-09-17T10:31:00.502509452Z: error updating the resource "vm-victoria-metrics-k8s-stack-etcd": cannot patch "vm-victoria-metrics-k8s-stack-etcd" with kind VMRule: Internal error occurred: failed calling webhook "vmrule.victoriametrics.com": failed to call webhook: Post "https://vm-victoria-metrics-operator.monitoring.svc:443/validate-operator-victoriametrics-com-v1beta1-vmrule?timeout=10s": no service port 443 found for service "vm-victoria-metrics-operator" 2024-09-17T10:31:00.541354085Z: Patch VMRule "vm-victoria-metrics-k8s-stack-general.rules" in namespace monitoring 2024-09-17T10:31:00.590095389Z: error updating the resource "vm-victoria-metrics-k8s-stack-general.rules": cannot patch "vm-victoria-metrics-k8s-stack-general.rules" with kind VMRule: Internal error occurred: failed calling webhook "vmrule.victoriametrics.com": failed to call webhook: Post "https://vm-victoria-metrics-operator.monitoring.svc:443/validate-operator-victoriametrics-com-v1beta1-vmrule?timeout=10s": no service port 443 found for service "vm-victoria-metrics-operator" 2024-09-17T10:31:01.082620751Z: warning: Upgrade "vm" failed: failed to create resource: Internal error occurred: failed calling webhook "vmrule.victoriametrics.com": failed to call webhook: Post "https://vm-victoria-metrics-operator.monitoring.svc:443/validate-operator-victoriametrics-com-v1beta1-vmrule?timeout=10s": no service port 443 found for service "vm-victoria-metrics-operator"

AndrewChubatiuk commented 3 days ago

hey @notz in changelog it's recommended to disable validating webhook before an upgrade now you can drop validatingwebhookconfiguration and then apply changes again

notz commented 3 days ago

@AndrewChubatiuk thx & sorry. Missed the release changelog. Only checked the repo's readme & changelog.

dhess commented 3 days ago

@notz Did this suggestion work for you? I tried disabling the webhook and then upgrading, but got the same error.

AndrewChubatiuk commented 3 days ago

there're two options:

  1. disable webhook, apply old version, apply new version, apply new version with a webhook enabled
  2. drop validatingwebhookconfiguration using kubectl, apply new version
notz commented 3 days ago

@dhess Yes, deleted ValidatingWebhookConfiguration (vm-victoria-metrics-operator-admission) and upgrade worked fine

dhess commented 3 days ago

@AndrewChubatiuk OK, here's what I did:

  1. Running victoria-metrics-k8s-stack v0.25.15, I set victoria-metrics-operator.admissionWebhooks.enabled: false in values.yaml and apply. Argo CD successfully deletes the victoria-metrics-victoria-metrics-operator-admission webhooks.
  2. Upgrade the chart to v0.25.16 and apply.
  3. Argo CD cannot sync: one or more objects failed to apply, reason: Service "victoria-metrics-victoria-metrics-operator" is invalid: spec.ports[2].name: Duplicate value: "webhook". Retrying attempt #5 at 12:52PM.

This repeats over and over.

dhess commented 2 days ago

I filed https://github.com/VictoriaMetrics/helm-charts/issues/1509 as it's technically different than the error described here.