bitnami-labs / sealed-secrets

A Kubernetes controller and tool for one-way encrypted Secrets
Apache License 2.0
7.57k stars 676 forks source link

Secret deleted by the garbage collector with delay #1599

Open aso-adeo opened 3 weeks ago

aso-adeo commented 3 weeks ago

Which component: sealed-secrets-controller:0.27.1

Describe the bug argoCD is replacing the sealedSecret. It is deleting the sealedSecret to recreate it in a few milliseconds. The sealedSecret controller can't unseal the sealedSecret because it is already existing. With less than 5 seconds of delay, the garbage-collector is seeing the secret with an obsolete SealedSecret ownerReference UID, and deletes it. Since the sealedSecret controller has given up the unseal of the SealedSecrets after 5 attempts, we don't have the secret anymore.

To Reproduce It's not easily reproducible because it didn't happen on every clusters we have made this scenario.

Expected behavior We're expecting the sealedSecret controller to try to unseal the secret with an exponential backoff, instead of doing all its attempts in a few milliseconds. It's not rare than the garbage collector has some delays in its actions.

Version of Kubernetes: 1.28 & 1.29

Server Version: v1.29.8-gke.1031000
alemorcuq commented 4 days ago

The retries are already done with an exponential backoff, but since the retry limit is just 5 they happen too fast:

$ kubectl logs -n kube-system deploy/sealed-secrets-controller | grep Updating | nl -v 0
     0  time=2024-09-29T11:49:35.764Z level=INFO msg=Updating key=default/my-secret
     1  time=2024-09-29T11:49:35.777Z level=INFO msg=Updating key=default/my-secret
     2  time=2024-09-29T11:49:35.794Z level=INFO msg=Updating key=default/my-secret
     3  time=2024-09-29T11:49:35.822Z level=INFO msg=Updating key=default/my-secret
     4  time=2024-09-29T11:49:35.867Z level=INFO msg=Updating key=default/my-secret
     5  time=2024-09-29T11:49:35.955Z level=INFO msg=Updating key=default/my-secret
     6  time=2024-09-29T11:49:36.123Z level=INFO msg=Updating key=default/my-secret
     7  time=2024-09-29T11:49:36.451Z level=INFO msg=Updating key=default/my-secret
     8  time=2024-09-29T11:49:37.098Z level=INFO msg=Updating key=default/my-secret
     9  time=2024-09-29T11:49:38.389Z level=INFO msg=Updating key=default/my-secret
    10  time=2024-09-29T11:49:40.957Z level=INFO msg=Updating key=default/my-secret
    11  time=2024-09-29T11:49:46.088Z level=INFO msg=Updating key=default/my-secret
    12  time=2024-09-29T11:49:56.338Z level=INFO msg=Updating key=default/my-secret
    13  time=2024-09-29T11:50:16.823Z level=INFO msg=Updating key=default/my-secret
    14  time=2024-09-29T11:50:57.793Z level=INFO msg=Updating key=default/my-secret
    15  time=2024-09-29T11:52:19.723Z level=INFO msg=Updating key=default/my-secret
    16  time=2024-09-29T11:55:03.572Z level=INFO msg=Updating key=default/my-secret

As you can see, each retry doubles the previous wait time, but it's not until the 9th retry that it starts waiting more than 1 second, and by the 15th retry it is already waiting more than 1 minute.

A quick solution for this would be to just increase the number of max retries (currently at 5). What do you think?

cc @agarcia-oss

aso-adeo commented 3 days ago

Yes, I think that we should increase the default number of max retries to 15.