Open loganmzz opened 6 months ago
I have the same problem, only I have it hangs stabily once every 2 weeks. Any luck finding a solution?
I have the same problem, only I have it hangs stabily once every 2 weeks. Any luck finding a solution?
Hi @OlehKyrtsun,
For my information which version are you using ? I remember having issue with Go Kubernetes client, disconnecting from watch without notification.
I didn't look at how Reflector works, but I saw a Kubernetes client update since 7.0.193
(version I currently use). I'm deploying an update and wrote a checker script. Just have to wait how it behaves for coming monthes.
I'm using reflector version 7.1.238. So far I decided to just write a job that will restart this sub once a week kubernetes - v1.28
To add onto this, we are on reflector 7.1.262 and have had multiple issues in the last few days of reflector not copying new certs into the istio-system namespace. Last logs in the pod show something like this, which actually shows it hasn't been working since April 22, 2024 (over a month ago). Deleting the pod and letting a new one come up fixes the issue, but it would be great if it didn't just silently die.
2024-04-22 16:07:21.430 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
2024-04-22 16:07:21.435 +00:00 [ERR] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Faulted due to exception.
k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'Unauthorized', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
at k8s.AbstractKubernetes.ICoreV1Operations_ListConfigMapForAllNamespacesWithHttpMessagesAsync[T](Nullable`1 allowWatchBookmarks, String continueParameter, String fieldSelector, String labelSelector, Nullable`1 limit, Nullable`1 pretty, String resourceVersion, String resourceVersionMatch, Nullable`1 sendInitialEvents, Nullable`1 timeoutSeconds, Nullable`1 watch, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
at k8s.AbstractKubernetes.k8s.ICoreV1Operations.ListConfigMapForAllNamespacesWithHttpMessagesAsync(Nullable`1 allowWatchBookmarks, String continueParameter, String fieldSelector, String labelSelector, Nullable`1 limit, Nullable`1 pretty, String resourceVersion, String resourceVersionMatch, Nullable`1 sendInitialEvents, Nullable`1 timeoutSeconds, Nullable`1 watch, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
at k8s.WatcherExt.<>c__DisplayClass1_0`2.<<MakeStreamReaderCreator>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at k8s.Watcher`1.<>c.<CreateWatchEventEnumerator>b__21_1[TR](Task`1 t)
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+MoveNext()
at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
2024-04-22 16:07:21.440 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:00.0091962. Faulted: True.
2024-04-22 16:07:26.442 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
I'll be creating a cronjob in the k8s on pitch deletions, haven't found another solution yet
@toni-rib-skydio "funny" yours get stuck closely at same date and hour.
I'll be creating a cronjob in the k8s on pitch deletions, haven't found another solution yet
May be healthcheck endpoint also require some improvements.
@loganmzz where is this hosted? We've seen similar issues with authorization on automatic node scaling when hosted in google cloud. Can you tell me more about the environment?
@loganmzz where is this hosted? We've seen similar issues with authorization on automatic node scaling when hosted in google cloud. Can you tell me more about the environment?
I'm on AWS with dynamic nodes (spot instances)
@loganmzz I don't have an environment like that to simulate. But it seems the authorization for the service account from those nodes is not allowed to contact the control-plane. I'm not sure how to debug this.
I've noticed an interesting thing. I have a reflector built to transfer SSL certificates between newspaces in k8s, when the certificate was reissued reflector caught this error but updated the certificate. I'm running on AWS. I'm waiting for a new cert update now
Versions:
v1.28.9-eks-036c24b
7.0.193
We encountered issue with outdated certificate for some services. After investigation, it seems Reflector is hanging since several weeks, not mirroring update from Cert Manager certificate secret.
Tried to update an annotation on given secret, but no reaction. However, HTTP healthcheck is still responding.
Last logs: