emberstack / kubernetes-reflector

Custom Kubernetes controller that can be used to replicate secrets, configmaps and certificates.
MIT License
1.05k stars 94 forks source link

Session closed message while proccessing #451

Open arjun-beathi opened 3 months ago

arjun-beathi commented 3 months ago

Hello, we have an AKS cluster(v1.29.4 and 7.1.262 reflector version) with 4500 Secrets out of which only 50 needs to be reflected between namespaces. The Secret mirror scan is not going through all the secrets.

Not sure whether it is number of secrets thing or something else is going on? Any suggestions?

` - name: ES_SerilogMinimumLevelDefault value: Information

Though the watcher timeout is 3600, why session is getting closed around 90sec? We don't use any configmap reflection but that session is closing at 3600secs.

2024-06-13 20:27:57.458 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:01:05.7533567. Faulted: False. 2024-06-13 20:29:32.344 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:01:34.8828116. Faulted: False. 2024-06-13 20:31:08.123 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:01:35.7787736. Faulted: False. 2024-06-13 20:32:27.219 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:01:19.0962849. Faulted: False. 2024-06-13 20:33:20.740 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:00:53.5203193. Faulted: False. 2024-06-13 20:34:16.275 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:00:55.5348593. Faulted: False. 2024-06-13 20:34:53.762 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:00:37.4871686. Faulted: False. 2024-06-13 20:35:45.568 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:00:51.8053794. Faulted: False. 2024-06-13 20:36:28.486 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:00:42.9175509. Faulted: False. 2024-06-13 20:37:27.092 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:00:58.6051816. Faulted: False. 2024-06-13 20:38:38.153 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:01:11.0611364. Faulted: False.

On Average it is taking 3 secs to sync/validate a secret - the maximum session I notice is 00:01:37.1330931. In that time it is reflecting 33(99Secs) secrets and ignoring remaining secrets.

One thing we want to consider is, we should filter the secrets based on annotations instead of reading all the secrets. kubectl get secrets -A -o jsonpath='{range .items[?(@.metadata.annotations.reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces)]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{.metadata.annotations.reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces}{"\n"}

Thanks

arjun-beathi commented 3 months ago

Update: I let the reflector pod running whole night. One instance the SecretMirror seesion went up to 1min 57secs.

2024-06-14 07:42:37.271 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:01:52.2517163. Faulted: False.

The secrets which I was hoping for got synced/created in that iteration.

winromulus commented 3 months ago

@arjun-beathi the session timeout forces the watcher to close early if k8s doesn't close it faster than that. Consider it the maximum watcher session length. It looks like your k8s instance is closing watcher sessions ~99 seconds. Which is odd (usually k8s closes it in around 40 minutes). The session length is controlled by the k8s control plan unless it reaches the maximum timeout (the one in config).

winromulus commented 3 months ago

Also watchers don't work like that. Reflector does not get the secrets, it requests a watcher for the secrets from k8s (which only supports label filters). A watcher gets all events (added for existing and new secrets and updated for newly modified secrets). The difference is that update events are immediately received, instead of polling for changes.

arjun-beathi commented 3 months ago

@winromulus Thanks for the clarification. ~99Sec thing is coming from controlplane.?! I will pursue on that side. I will verify all other clusters as well. keep you posted. May be AKS tunning required.

In this particular scenario, when the session executed for 112Sec, then it could able to reflect the secret. We got lucky there.

Note: The other sessions like configMapMirror or namespace watcher are watching for the specified time(3600s), only secrets have this issue.

image image image
arjun-beathi commented 3 months ago

@winromulus Update: We have around 50 clusters, most of them are behaving correctly in session duration time. Except in 4 clusters. So, I doubt it is k8s controlplane. Any other place to look?

image

arjun-beathi commented 3 months ago

@winromulus Any pointers/suggestions ?

arjun-beathi commented 3 months ago

One of the cluster has 11k sh.helm.release secrets. I deleted some of them to 3k, now reflector secret mirror session is 1hour. Not sure whether some other process is loading kube api-server and impacting reflector process.

Before: ❯ k get secret -A | grep 'sh.helm' | wc -l 11014

After: ❯ k get secret -A | grep 'sh.helm' | wc -l 3235