certController: implement startupProbe

sipao commented 1 year ago

The certificate for webhook is not created, I would like to know how to solve this problem.

The cluster is built on Google Kubernetes Engine. it is not private cluster.

It is added to Helm's Dependencies and installed.

Chart.yaml

apiVersion: v2
name: external-secrets
type: application
version: 0.1.0
dependencies:
  - name: external-secrets
    repository: https://charts.external-secrets.io
    version: 0.8.1

values.yaml

external-secrets:
  installCRDs: false

kubectl get pods -n external-secrets

NAME                                               READY   STATUS             RESTARTS        AGE
external-secrets-5d6bd4dd54-fhf7t                  1/1     Running            0               50m
external-secrets-cert-controller-c485f7fb6-5gnsd   0/1     Running            0               50m
external-secrets-webhook-55d954796f-7mvjn          0/1     CrashLoopBackOff   10 (2m3s ago)   50m

part of cert-controller's logs

{"level":"error","ts":1681119717.0161457,"logger":"controllers.webhook-certs-updater","msg":"could not update webhook config","Webhookconfig":"/externalsecret-validate","error":"ca cert not yet ready","stacktrace":"github.com/external-secrets/external-secrets/pkg/controllers/webhookconfig.(*Reconciler).Reconcile\n\t/home/runner/work/external-secrets/external-secrets/pkg/controllers/webhookconfig/webhookconfig.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":1681119717.0163078,"msg":"Reconciler error","controller":"validatingwebhookconfiguration","controllerGroup":"admissionregistration.k8s.io","controllerKind":"ValidatingWebhookConfiguration","ValidatingWebhookConfiguration":{"name":"externalsecret-validate"},"namespace":"","name":"externalsecret-validate","reconcileID":"bb4476a1-c67a-4f97-a790-dd6411dd89bf","error":"ca cert not yet ready","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:235"}
{"level":"info","ts":1681119717.0165868,"logger":"controllers.webhook-certs-updater","msg":"updating webhook config","Webhookconfig":"/secretstore-validate"}
{"level":"error","ts":1681119717.017655,"logger":"controllers.webhook-certs-updater","msg":"failed to inject conversion webhook","CustomResourceDefinition":"/externalsecrets.external-secrets.io","error":"unexpected crd conversion webhook config","stacktrace":"github.com/external-secrets/external-secrets/pkg/controllers/crds.(*Reconciler).Reconcile\n\t/home/runner/work/external-secrets/external-secrets/pkg/controllers/crds/crds_controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":1681119717.0180433,"msg":"Reconciler error","controller":"customresourcedefinition","controllerGroup":"apiextensions.k8s.io","controllerKind":"CustomResourceDefinition","CustomResourceDefinition":{"name":"externalsecrets.external-secrets.io"},"namespace":"","name":"externalsecrets.external-secrets.io","reconcileID":"ae043449-3578-4e2f-9e6e-dfd5694c12c7","error":"unexpected crd conversion webhook config","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:235"}
{"level":"info","ts":1681119747.5271537,"logger":"controller-runtime.healthz","msg":"healthz check failed","statuses":[{},{}]}
{"level":"info","ts":1681119752.5275629,"logger":"controller-runtime.healthz","msg":"healthz check failed","statuses":[{},{}]}

part of webhook's logs

{"level":"info","ts":1681120623.1140406,"logger":"setup","msg":"validating certs"}
{"level":"error","ts":1681120623.1144512,"logger":"setup","msg":"invalid certs. retrying...","error":"stat /tmp/certs/tls.crt: no such file or directory","stacktrace":"github.com/external-secrets/external-secrets/cmd.waitForCerts\n\t/home/runner/work/external-secrets/external-secrets/cmd/webhook.go:188\ngithub.com/external-secrets/external-secrets/cmd.glob..func3\n\t/home/runner/work/external-secrets/external-secrets/cmd/webhook.go:82\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/external-secrets/external-secrets/cmd.Execute\n\t/home/runner/work/external-secrets/external-secrets/cmd/root.go:221\nmain.main\n\t/home/runner/work/external-secrets/external-secrets/main.go:21\nruntime.main\n\t/opt/hostedtoolcache/go/1.19.7/x64/src/runtime/proc.go:250"}
{"level":"info","ts":1681120633.115198,"logger":"setup","msg":"validating certs"}
{"level":"error","ts":1681120633.115328,"logger":"setup","msg":"invalid certs. retrying...","error":"stat /tmp/certs/tls.crt: no such file or directory","stacktrace":"github.com/external-secrets/external-secrets/cmd.waitForCerts\n\t/home/runner/work/external-secrets/external-secrets/cmd/webhook.go:188\ngithub.com/external-secrets/external-secrets/cmd.glob..func3\n\t/home/runner/work/external-secrets/external-secrets/cmd/webhook.go:82\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/external-secrets/external-secrets/cmd.Execute\n\t/home/runner/work/external-secrets/external-secrets/cmd/root.go:221\nmain.main\n\t/home/runner/work/external-secrets/external-secrets/main.go:21\nruntime.main\n\t/opt/hostedtoolcache/go/1.19.7/x64/src/runtime/proc.go:250"}

Thanks for your help.

moolen commented 1 year ago

Can you post the CRD yaml? I think it doesn't have a .spec.conversion.webhook.clientConfig. Did you try to do a clean re-install of the operator? Be warned: don't do this in production.

moolen commented 1 year ago

I just read installCRDs: false. If you don't install them via the helm chart, how do you install them? Is there a reason you can't use them from the helm chart?

sipao commented 1 year ago

Can you post the CRD yaml? I think it doesn't have a .spec.conversion.webhook.clientConfig. Did you try to do a clean re-install of the operator? Be warned: don't do this in production.

How can I confirm CRD yaml? thanks. I can try out everything because I create a development environment.

If you don't install them via the helm chart, how do you install them?

CRDs are installed via Kustomization. The reason is that setting installCRDs: true would have caused an error that already existed while trying.

resources:
  - github.com/external-secrets/external-secrets/config/crds/bases?ref=v0.8.1

It is then installed using Helm. helm dependency build helm install external-secrets . -f values.dev.yaml -n external-secrets

Is there a reason you can't use them from the helm chart?

Because I want to keep the configuration values as code in values.yaml. If you have any other good ideas, I would like to know.

moolen commented 1 year ago

You should see a .spec.conversion node in the CRD resource yaml:

$ kubectl get crd externalsecrets.external-secrets.io -o yaml | yq .spec.conversion
strategy: Webhook
webhook:
  clientConfig:
    caBundle: "...."
    service:
      name: external-secrets-webhook
      namespace: default
      path: /convert
      port: 443
  conversionReviewVersions:
    - v1

If not, you should install the CRDs via the helm chart. The CRDs from the source you're using don't have the conversion spec set, they are not supposed to work in conjunction with the certController.

ivanov-danil commented 1 year ago

I have the same problem during first install (0.8.1 version). My release is failing, but after 1-2 minutes cert-controller and webhook starting and all is ok I installing CRD from the same helm chart but like a bundle.yaml from crds/bundle.yaml {"level":"error","ts":1681978747.9436102,"logger":"setup","msg":"invalid certs. retrying...","error":"stat /tmp/certs/tls.crt: no such file or directory","stacktrace":"github.com/external-secrets/external-sec ↵ │ │ rets/cmd.waitForCerts\n\t/home/runner/work/external-secrets/external-secrets/cmd/webhook.go:188\ngithub.com/external-secrets/external-secrets/cmd.glob..func3\n\t/home/runner/work/external-secrets/external-sec ↵ │ │ rets/cmd/webhook.go:82\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/runner/go/pkg/mod ↵ │ │ /github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/external-secrets/external-secrets/ ↵ │ │ cmd.Execute\n\t/home/runner/work/external-secrets/external-secrets/cmd/root.go:221\nmain.main\n\t/home/runner/work/external-secrets/external-secrets/main.go:21\nruntime.main\n\t/opt/hostedtoolcache/go/1.19.7/ ↵ │ │ x64/src/runtime/proc.go:250"}

kubectl get crd externalsecrets.external-secrets.io -o yaml | yq .spec.conversion
strategy: Webhook
webhook:
  clientConfig:
    caBundle: ''
    service:
      name: external-secrets-operator-webhook
      namespace: external-secrets-operator
      path: /convert
      port: 443
  conversionReviewVersions:
    - v1

moolen commented 1 year ago

but after 1-2 minutes cert-controller and webhook starting and all is ok

This is expected, it takes a while until everything propagates through the system.

ivanov-danil commented 1 year ago

But readiness probe failing more 3 times. It's not good :( If this problem expected - maybe should refactor a conduct of starting the service (initContainer maybe?) And one more question: i can prioritize start of deployments (e.g. certController starting first, readiness probe is done, and next starting webhook without any problems, if certConroller is running 1/1). Can it be a solution problem?

moolen commented 1 year ago

But readiness probe failing more 3 times. It's not good :(

It is completely fine that readiness probe fails: the service is not yet ready, there is nothing wrong about it. Or why do you think this as an issue?

i can prioritize start of deployments [...] certController starting first, [...] next starting webhook.

It doesn't work like that. certController becomes ready once the webhook component becomes ready.

For context:

certController generates TLS private key / cert for webhook. this is stored in a Kind=Secret
certController injects caBundle into (1) CRD conversion webhook and (2) validating webhook
webhook reads the TLS certificate from a Kind=Secret generated in step 1. Once that is available both certController and webhook become ready

It takes some time until the TLS credentials stored in a Kind=Secret propagate to the webhook component. See --sync-frequency (default 1m) in the kubelet config.

This is the delay you're observing.

ivanov-danil commented 1 year ago

It doesn't work like that. certController becomes ready once the webhook component becomes ready.

Thanks for explaining!

It is completely fine that readiness probe fails: the service is not yet ready, there is nothing wrong about it. Or why do you think this as an issue?

I can explain it to you: readiness probe has the "initialDelaySeconds". As you know it means when probe should be starting. By logic, after the "initialDelaySeconds" waiting - app should be ready to work (if we haven't a problem with app). We are using a tool calls "Werf". It monitoring an install process till the all apps not be 1/1. (and first release install is failing all the time) Maybe it's a special case, but by logic architecture of apps and probes - after "initialDelaySeconds" period - app should be ready to success readiness probe if we haven't a problem. As you say, it works by design, it's not a problem.

moolen commented 1 year ago

I see. Well, i guess it would be best to add a startupProbe to the certController.

I never noticed that issue, b/c helm functions differently in regards to the readiness of applications after installation. You define a --wait <timeout>. If the application(s) do not become ready within the timeout the installation is considered to have failed.

Werf seems to interpret it differently: solely based on readiness probe, not having a timeout after all.

moolen commented 1 year ago

I've changed the title + labels to reflect that, or do you @sipao have any issues regarding your original comment?

sipao commented 1 year ago

sorry for late.

kubectl get crd externalsecrets.external-secrets.io -o yaml | yq .spec.conversion
strategy: None

sipao commented 1 year ago

I followed the guide and installed it, and it probably worked.

   external-secrets/external-secrets \
    -n external-secrets \
    --create-namespace \
    --set installCRDs=true

k get pods -n external-secrets
NAME                                               READY   STATUS    RESTARTS   AGE
external-secrets-5d6bd4dd54-q9mgg                  1/1     Running   0          3m10s
external-secrets-cert-controller-c485f7fb6-dsxmx   1/1     Running   0          3m10s
external-secrets-webhook-55d954796f-zqjs9          1/1     Running   0          3m10s

Why didn't the following method work?

Installation of crds in kustomization resources
Dependencies in chart.yaml

moolen commented 1 year ago

Why didn't the following method work?

The CRDs from kustomization don't have the conversion spec set, they are not supposed to work in conjunction with the certController. The conversion spec contains the Kubernetes Kind=Service name and namespace. These are variable values and depend on the installation of the operator/webhook/certcontroller components.

ivanov-danil commented 1 year ago

Werf seems to interpret it differently: solely based on readiness probe, not having a timeout after all.

Werf has a timeout flag, but i don't using this feature. I decided this problem right now via annotation to deployment that ignore readiness probes for webhook and certContoller containers for 2 minutes (if readiness probes done - release finished success, if failed - wait 2 extra min till the success or fail RP) "werf.io/ignore-readiness-probe-fails-for-webhook": "2m" "werf.io/ignore-readiness-probe-fails-for-certcontoller": "2m"

I see. Well, i guess it would be best to add a startupProbe to the certController.

Thanks for that!

sipao commented 1 year ago

Thank you very much for the swift reply 🙏 🤩

Thank you for the great project.

external-secrets / external-secrets

certController: implement startupProbe #2217