emberstack / kubernetes-reflector

Custom Kubernetes controller that can be used to replicate secrets, configmaps and certificates.
MIT License
1.11k stars 96 forks source link

Reflector hangs after exception #442

Open loganmzz opened 6 months ago

loganmzz commented 6 months ago

Versions:

We encountered issue with outdated certificate for some services. After investigation, it seems Reflector is hanging since several weeks, not mirroring update from Cert Manager certificate secret.

Tried to update an annotation on given secret, but no reaction. However, HTTP healthcheck is still responding.

Last logs:

2024-04-22 14:58:36.937 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Requesting V1Secret resources
2024-04-22 14:58:37.305 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected argocd/argo-workflows-s3 where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 51.
2024-04-22 14:58:37.594 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected cert-manager/wildcard-priv-mycompany-cloud where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 82.
2024-04-22 14:58:37.803 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected global/global-secret where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 51.
2024-04-22 15:13:45.131 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:15:08.1942289. Faulted: False.
2024-04-22 15:13:45.132 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Requesting V1Secret resources
2024-04-22 15:13:45.145 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Session closed. Duration: 00:19:31.9075751. Faulted: False.
2024-04-22 15:13:45.146 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Requesting V1Namespace resources
2024-04-22 15:13:45.668 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected argocd/argo-workflows-s3 where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 51.
2024-04-22 15:13:45.902 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected cert-manager/wildcard-priv-mycompany-cloud where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 82.
2024-04-22 15:13:46.104 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected global/global-secret where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 51.
2024-04-22 15:18:15.767 +00:00 [ERR] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Faulted due to exception.
System.IO.IOException: The request was aborted.
 ---> System.IO.IOException: The response ended prematurely while waiting for the next frame from the server.
   at System.Net.Http.Http2Connection.<ReadFrameAsync>g__ThrowMissingFrame|69_1()
   at System.Net.Http.Http2Connection.ReadFrameAsync(Boolean initialFrame)
   at System.Net.Http.Http2Connection.ProcessIncomingFramesAsync()
   --- End of inner exception stack trace ---
   at System.Net.Http.Http2Connection.ThrowRequestAborted(Exception innerException)
   at System.Net.Http.Http2Connection.Http2Stream.CheckResponseBodyState()
   at System.Net.Http.Http2Connection.Http2Stream.TryReadFromBuffer(Span`1 buffer, Boolean partOfSyncRead)
   at System.Net.Http.Http2Connection.Http2Stream.ReadDataAsync(Memory`1 buffer, HttpResponseMessage responseMessage, CancellationToken cancellationToken)
   at k8s.LineSeparatedHttpContent.CancelableStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
   at System.IO.StreamReader.ReadBufferAsync(CancellationToken cancellationToken)
   at System.IO.StreamReader.ReadLineAsyncInternal(CancellationToken cancellationToken)
   at k8s.Watcher`1.<>c.<CreateWatchEventEnumerator>b__21_1[TR](Task`1 t)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+MoveNext()
   at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
   at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
2024-04-22 15:18:15.772 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Session closed. Duration: 00:04:30.6266915. Faulted: True.
2024-04-22 15:18:15.770 +00:00 [ERR] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Faulted due to exception.
System.IO.IOException: The request was aborted.
 ---> System.IO.IOException: The response ended prematurely while waiting for the next frame from the server.
   at System.Net.Http.Http2Connection.<ReadFrameAsync>g__ThrowMissingFrame|69_1()
   at System.Net.Http.Http2Connection.ReadFrameAsync(Boolean initialFrame)
   at System.Net.Http.Http2Connection.ProcessIncomingFramesAsync()
   --- End of inner exception stack trace ---
   at System.Net.Http.Http2Connection.ThrowRequestAborted(Exception innerException)
   at System.Net.Http.Http2Connection.Http2Stream.CheckResponseBodyState()
   at System.Net.Http.Http2Connection.Http2Stream.TryReadFromBuffer(Span`1 buffer, Boolean partOfSyncRead)
   at System.Net.Http.Http2Connection.Http2Stream.ReadDataAsync(Memory`1 buffer, HttpResponseMessage responseMessage, CancellationToken cancellationToken)
   at k8s.LineSeparatedHttpContent.CancelableStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
   at System.IO.StreamReader.ReadBufferAsync(CancellationToken cancellationToken)
   at System.IO.StreamReader.ReadLineAsyncInternal(CancellationToken cancellationToken)
   at k8s.Watcher`1.<>c.<CreateWatchEventEnumerator>b__21_1[TR](Task`1 t)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+MoveNext()
   at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
   at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
2024-04-22 15:18:15.774 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:23:19.9873238. Faulted: True.
2024-04-22 15:18:15.769 +00:00 [ERR] (ES.Kubernetes.Reflector.Core.SecretWatcher) Faulted due to exception.
System.IO.IOException: The request was aborted.
 ---> System.IO.IOException: The response ended prematurely while waiting for the next frame from the server.
   at System.Net.Http.Http2Connection.<ReadFrameAsync>g__ThrowMissingFrame|69_1()
   at System.Net.Http.Http2Connection.ReadFrameAsync(Boolean initialFrame)
   at System.Net.Http.Http2Connection.ProcessIncomingFramesAsync()
   --- End of inner exception stack trace ---
   at System.Net.Http.Http2Connection.ThrowRequestAborted(Exception innerException)
   at System.Net.Http.Http2Connection.Http2Stream.CheckResponseBodyState()
   at System.Net.Http.Http2Connection.Http2Stream.TryReadFromBuffer(Span`1 buffer, Boolean partOfSyncRead)
   at System.Net.Http.Http2Connection.Http2Stream.ReadDataAsync(Memory`1 buffer, HttpResponseMessage responseMessage, CancellationToken cancellationToken)
   at k8s.LineSeparatedHttpContent.CancelableStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
   at System.IO.StreamReader.ReadBufferAsync(CancellationToken cancellationToken)
   at System.IO.StreamReader.ReadLineAsyncInternal(CancellationToken cancellationToken)
   at k8s.Watcher`1.<>c.<CreateWatchEventEnumerator>b__21_1[TR](Task`1 t)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+MoveNext()
   at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
   at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
2024-04-22 15:18:15.781 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:04:30.6497023. Faulted: True.
2024-04-22 15:18:20.772 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
2024-04-22 15:18:20.772 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Requesting V1Namespace resources
2024-04-22 15:18:20.783 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Requesting V1Secret resources
2024-04-22 15:18:21.484 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected argocd/argo-workflows-s3 where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 51.
2024-04-22 15:18:21.873 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected cert-manager/wildcard-priv-mycompany-cloud where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 82.
2024-04-22 15:18:22.057 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected global/global-secret where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 51.
OlehKyrtsun commented 6 months ago

I have the same problem, only I have it hangs stabily once every 2 weeks. Any luck finding a solution?

loganmzz commented 6 months ago

I have the same problem, only I have it hangs stabily once every 2 weeks. Any luck finding a solution?

Hi @OlehKyrtsun, For my information which version are you using ? I remember having issue with Go Kubernetes client, disconnecting from watch without notification. I didn't look at how Reflector works, but I saw a Kubernetes client update since 7.0.193 (version I currently use). I'm deploying an update and wrote a checker script. Just have to wait how it behaves for coming monthes.

OlehKyrtsun commented 6 months ago

I'm using reflector version 7.1.238. So far I decided to just write a job that will restart this sub once a week kubernetes - v1.28

toni-rib-skydio commented 6 months ago

To add onto this, we are on reflector 7.1.262 and have had multiple issues in the last few days of reflector not copying new certs into the istio-system namespace. Last logs in the pod show something like this, which actually shows it hasn't been working since April 22, 2024 (over a month ago). Deleting the pod and letting a new one come up fixes the issue, but it would be great if it didn't just silently die.

2024-04-22 16:07:21.430 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
2024-04-22 16:07:21.435 +00:00 [ERR] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Faulted due to exception.
k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'Unauthorized', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

   at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.ICoreV1Operations_ListConfigMapForAllNamespacesWithHttpMessagesAsync[T](Nullable`1 allowWatchBookmarks, String continueParameter, String fieldSelector, String labelSelector, Nullable`1 limit, Nullable`1 pretty, String resourceVersion, String resourceVersionMatch, Nullable`1 sendInitialEvents, Nullable`1 timeoutSeconds, Nullable`1 watch, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.k8s.ICoreV1Operations.ListConfigMapForAllNamespacesWithHttpMessagesAsync(Nullable`1 allowWatchBookmarks, String continueParameter, String fieldSelector, String labelSelector, Nullable`1 limit, Nullable`1 pretty, String resourceVersion, String resourceVersionMatch, Nullable`1 sendInitialEvents, Nullable`1 timeoutSeconds, Nullable`1 watch, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at k8s.WatcherExt.<>c__DisplayClass1_0`2.<<MakeStreamReaderCreator>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at k8s.Watcher`1.<>c.<CreateWatchEventEnumerator>b__21_1[TR](Task`1 t)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+MoveNext()
   at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
   at ES.Kubernetes.Reflector.Core.Watchers.WatcherBackgroundService`2.ExecuteAsync(CancellationToken stoppingToken) in /src/ES.Kubernetes.Reflector/Core/Watchers/WatcherBackgroundService.cs:line 78
2024-04-22 16:07:21.440 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:00.0091962. Faulted: True.
2024-04-22 16:07:26.442 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
OlehKyrtsun commented 6 months ago

I'll be creating a cronjob in the k8s on pitch deletions, haven't found another solution yet

loganmzz commented 6 months ago

@toni-rib-skydio "funny" yours get stuck closely at same date and hour.

loganmzz commented 6 months ago

I'll be creating a cronjob in the k8s on pitch deletions, haven't found another solution yet

May be healthcheck endpoint also require some improvements.

winromulus commented 5 months ago

@loganmzz where is this hosted? We've seen similar issues with authorization on automatic node scaling when hosted in google cloud. Can you tell me more about the environment?

loganmzz commented 5 months ago

@loganmzz where is this hosted? We've seen similar issues with authorization on automatic node scaling when hosted in google cloud. Can you tell me more about the environment?

I'm on AWS with dynamic nodes (spot instances)

winromulus commented 5 months ago

@loganmzz I don't have an environment like that to simulate. But it seems the authorization for the service account from those nodes is not allowed to contact the control-plane. I'm not sure how to debug this.

OlehKyrtsun commented 5 months ago

I've noticed an interesting thing. I have a reflector built to transfer SSL certificates between newspaces in k8s, when the certificate was reissued reflector caught this error but updated the certificate. I'm running on AWS. I'm waiting for a new cert update now