Open aso-adeo opened 3 weeks ago
The retries are already done with an exponential backoff, but since the retry limit is just 5 they happen too fast:
$ kubectl logs -n kube-system deploy/sealed-secrets-controller | grep Updating | nl -v 0
0 time=2024-09-29T11:49:35.764Z level=INFO msg=Updating key=default/my-secret
1 time=2024-09-29T11:49:35.777Z level=INFO msg=Updating key=default/my-secret
2 time=2024-09-29T11:49:35.794Z level=INFO msg=Updating key=default/my-secret
3 time=2024-09-29T11:49:35.822Z level=INFO msg=Updating key=default/my-secret
4 time=2024-09-29T11:49:35.867Z level=INFO msg=Updating key=default/my-secret
5 time=2024-09-29T11:49:35.955Z level=INFO msg=Updating key=default/my-secret
6 time=2024-09-29T11:49:36.123Z level=INFO msg=Updating key=default/my-secret
7 time=2024-09-29T11:49:36.451Z level=INFO msg=Updating key=default/my-secret
8 time=2024-09-29T11:49:37.098Z level=INFO msg=Updating key=default/my-secret
9 time=2024-09-29T11:49:38.389Z level=INFO msg=Updating key=default/my-secret
10 time=2024-09-29T11:49:40.957Z level=INFO msg=Updating key=default/my-secret
11 time=2024-09-29T11:49:46.088Z level=INFO msg=Updating key=default/my-secret
12 time=2024-09-29T11:49:56.338Z level=INFO msg=Updating key=default/my-secret
13 time=2024-09-29T11:50:16.823Z level=INFO msg=Updating key=default/my-secret
14 time=2024-09-29T11:50:57.793Z level=INFO msg=Updating key=default/my-secret
15 time=2024-09-29T11:52:19.723Z level=INFO msg=Updating key=default/my-secret
16 time=2024-09-29T11:55:03.572Z level=INFO msg=Updating key=default/my-secret
As you can see, each retry doubles the previous wait time, but it's not until the 9th retry that it starts waiting more than 1 second, and by the 15th retry it is already waiting more than 1 minute.
A quick solution for this would be to just increase the number of max retries (currently at 5). What do you think?
cc @agarcia-oss
Yes, I think that we should increase the default number of max retries to 15.
Which component: sealed-secrets-controller:0.27.1
Describe the bug argoCD is replacing the sealedSecret. It is deleting the sealedSecret to recreate it in a few milliseconds. The sealedSecret controller can't unseal the sealedSecret because it is already existing. With less than 5 seconds of delay, the garbage-collector is seeing the secret with an obsolete SealedSecret ownerReference UID, and deletes it. Since the sealedSecret controller has given up the unseal of the SealedSecrets after 5 attempts, we don't have the secret anymore.
To Reproduce It's not easily reproducible because it didn't happen on every clusters we have made this scenario.
Expected behavior We're expecting the sealedSecret controller to try to unseal the secret with an exponential backoff, instead of doing all its attempts in a few milliseconds. It's not rare than the garbage collector has some delays in its actions.
Version of Kubernetes: 1.28 & 1.29
kubectl version
: