VictoriaMetrics / helm-charts

Helm charts for VictoriaMetrics, VictoriaLogs and ecosystem
https://victoriametrics.github.io/helm-charts/
Apache License 2.0
309 stars 312 forks source link

operator fails to start due to manager not in $PATH #1127

Closed cathelijne closed 2 weeks ago

cathelijne commented 3 weeks ago

Helm chart 0.24.*, deploying from argocd.

  source:
    repoURL: https://victoriametrics.github.io/helm-charts/
    chart: victoria-metrics-k8s-stack
    targetRevision: 0.24.*
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "manager": executable file not found in $PATH: unknown

kube server: v1.27.7+k3s2 argocd: v2.10.12+cb6f5ac operator: v0.46.3

reverting back to helm release 0.23.5 works fine

Screenshot 2024-07-08 at 12 46 33

(sorry for reporting by screenshot, I had already pushed the revert commit for argo to use when it dawned on me I should probably report this)

f41gh7 commented 2 weeks ago

An issue with operator start was fixed at chart version v0.24.1:

vm/victoria-metrics-k8s-stack                       0.24.1          v1.101.0
dhess commented 2 weeks ago

I just upgraded to v0.24.1 and this is still happening.

f41gh7 commented 2 weeks ago

I just upgraded to v0.24.1 and this is still happening.

It's really weird, for some reason argocd doesn't apply changes to operator deployment definition. New version doesn't have command specification for operator's container.

dhess commented 2 weeks ago

Yes, I can confirm that we're deploying using Argo CD and that's where we're seeing it.

f41gh7 commented 2 weeks ago

@k1rk @Haleygo maybe you could help, is any additional option required to be set for argocd? Maybe sync policy?

cathelijne commented 2 weeks ago

I just upgraded to v0.24.1 and this is still happening.

It's really weird, for some reason argocd doesn't apply changes to operator deployment definition. New version doesn't have command specification for operator's container.

This had me thinking... I 'upgraded' to 0.24.1 in the argocd app and the operator failed to start again with the message that 'command' is not found in the $PATH. Looking at the applied deployment manifest for 0.23.5 shows a command entry in the container spec. I then searched for 'command:' in the argocd chart commits, but never found it. Before digging too deep, I decided to have a look at not only the live, but also the desired manifest in argocd.

The applied (i.e. live) manifest for 0.23.5 has a command: entry in the container spec. The desired manifest for 0.24.1 does not have this, yet it doesn't show up in the diff between the two operator versions. The diff is completely empty, even.

Deleting the operator deployment from argocd and then sync it again applies the correct manifest and install 0.46.4 as intended.

All in all, this looks more like an issue with argocd than a bug in the chart. It should have shown up in the diff.

Edit: I found issue (https://github.com/argoproj/argo-cd/issues/19015) in the argocd repo. I am indeed using server side apply for my vm k8s stack chart.

Haleygo commented 2 weeks ago

@dhess Could you try force sync on argocd side? As @cathelijne's comment, argocd might didn't apply the changed command to operator deployment.

dhess commented 2 weeks ago

@Haleygo Force sync didn't work and resulted in the same error.

However, after I deleted the deployment from the Argo CD UI, the recreated deployment worked fine, so I'm up and running with v0.24.2 of the chart. Thanks!

Haleygo commented 2 weeks ago

Closing this as there is no bug on chart. For users who have the same issue after update, please delete the operator deployment and re-deploy.