argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
18k stars 5.48k forks source link

If any conversion webhook on any CRD isn't available, all apps on the cluster go to an "unknown" state. #20828

Open johnthompson-ybor opened 4 days ago

johnthompson-ybor commented 4 days ago

Checklist:

Describe the bug

Argocd version: v2.12.4+27d1e64

if you install any CRDs on the clusters with conversion webhooks, and the conversion webhook is down, then all applications on the cluster go to an Unknown or an error state:

Failed to load target state: failed to get cluster version for cluster "": failed to get cluster info for """: error synchronizing cache state : failed to sync cluster ": failed to load initial state of resource BucketServerSideEncryptionConfiguration.s3.aws.upbound.io: conversion webhook for s3.aws.upbound.io/v1beta1, Kind=BucketServerSideEncryptionConfiguration failed: Post "https://provider-aws-s3.crossplane-system.svc:9443/convert?timeout=30s": no endpoints available for service "provider-aws-s3"

If I have SSA on, the UI just gets stuck in "refreshing" and there's a nil pointer exception in the logs.

time="2024-11-18T14:19:18Z" level=error msg="Recovered from panic: runtime error: invalid memory address or nil pointer dereference

goroutine 294 [running]: runtime/debug.Stack() /usr/local/go/src/runtime/debug/stack.go:24 +0x5e

github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).processAppRefreshQueueItem.func1() /go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:1480 +0x54

panic({0x382cd20?, 0x7756330?}) /usr/local/go/src/runtime/panic.go:770 +0x132

github.com/argoproj/argo-cd/v2/controller.(*appStateManager).CompareAppState(0xc00055cd20, 0xc0dae6a408, 0xc0a7114488, {0xc0a792d6c0, 0x1, 0x1}, {0xc0a7920700, 0x1, 0x1}, 0x0, ...) /go/src/github.com/argoproj/argo-cd/controller/state.go:864 +0x5ff9

github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).processAppRefreshQueueItem(0xc0004dec40) /go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:1590 +0x1188

github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).Run.func3() /go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:830 +0x25

k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/pkg/mod/k8s.io/apimachinery@v0.29.6/pkg/util/wait/backoff.go:226 +0x33

k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000636b00, {0x5555d00, 0xc001cec2a0}, 0x1, 0xc000081f80) /go/pkg/mod/k8s.io/apimachinery@v0.29.6/pkg/util/wait/backoff.go:227 +0xaf

k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000636b00, 0x3b9aca00, 0x0, 0x1, 0xc000081f80) /go/pkg/mod/k8s.io/apimachinery@v0.29.6/pkg/util/wait/backoff.go:204 +0x7f

k8s.io/apimachinery/pkg/util/wait.Until(...) /go/pkg/mod/k8s.io/apimachinery@v0.29.6/pkg/util/wait/backoff.go:161

created by github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).Run in goroutine 112 /go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:829 +0x865

To Reproduce

Install a CRD with a conversion webhook that goes to an unavailable endpoint.

Expected behavior

I'm not sure what the expected behavior should be. I don't think there should be a NPE when it happens in SSA at the very least.

It would be nice to be able to exclude those resources on an app by app basis, or be able to skip any resources that aren't included in the application? It basically means that if I need to do a new sync to fix the webhook, I can't really do it.

Screenshots

Version

Paste the output from `argocd version` here.

Argocd version: v2.12.4+27d1e64

Logs

Paste any relevant application logs here.
crenshaw-dev commented 4 days ago

Are you sure the controller is on 2.12.4? Not sure how this line can throw a nil pointer exception:

https://github.com/argoproj/argo-cd/blob/27d1e641b6ea99d9f4bf788c032aeaeefd782910/controller/state.go#L864

johnthompson-ybor commented 2 days ago

I thought the same thing, but I just confirmed and that's what version I'm on.

andrii-korotkov-verkada commented 1 day ago

Just to clarify, argocd version outputs versions for argocd and argocd server. The first one is cli, the 2nd one is server-side. We need the 2nd one. Sorry if you already checked that and that's also 2.12.4.

andrii-korotkov-verkada commented 1 day ago

Tho maybe we have some memory corruption.

andrii-korotkov-verkada commented 5 hours ago

Can you try with v2.13.1, please?