Airbyte (Container Orchestrator) Intermittent WorkerException: `KubePortManagerSingleton.take() is null` causes outage💥

seanglynn-thrive commented 1 year ago

## Environment - **Airbyte version**: 0.41.0 - **OS Version / Instance**: Standard_D8ads_v5 on Azure AKS - **Deployment**: Kubernetes (Container Orchestrator) - **Step where error happened**: Sync job ## Current Behavior Once every 2-3 days, all of our sync jobs (running 15 / hour) come to a standstill and the Airbyte worker cannot spin up any more job pods. The worker throws the following exception which seems to indicate that it can no longer allocate/open ports to the ephemeral job pods: ```log io.airbyte.workers.exception.WorkerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "io.airbyte.workers.process.KubePortManagerSingleton.take()" is null ``` The Airbyte worker workers freeze up and lose the ability to deploy job pods. All sync jobs fail until the worker deployment is restarted (via ArgoCD / kubectl). The issue seems to point at the [KubePortManagerSingleton](https://github.com/airbytehq/airbyte-platform/blob/main/airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubePortManagerSingleton.java#L28) class within the Worker process. Within the [KubePortManagerSingleton](https://github.com/airbytehq/airbyte-platform/blob/4271bb673251268ca47b7fe954340271931af722/airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubePortManagerSingleton.java#L37) class specifically, We hit a point where the [KubeProcessFactory](https://github.com/airbytehq/airbyte-platform/blob/4271bb673251268ca47b7fe954340271931af722/airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubeProcessFactory.java#L106) never returns a port from the [KubePortManagerSingleton take()](https://github.com/airbytehq/airbyte-platform/blob/4271bb673251268ca47b7fe954340271931af722/airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubePortManagerSingleton.java#L66-L68) method. The [BlockingQueue workerPorts](https://github.com/airbytehq/airbyte-platform/blob/4271bb673251268ca47b7fe954340271931af722/airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubePortManagerSingleton.java#L35) variable does not provide sufficient ports to the [KubePodProcess](https://github.com/airbytehq/airbyte-platform/blob/4271bb673251268ca47b7fe954340271931af722/airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubeProcessFactory.java#L119-L146) function, which in turn prevents the worker from deploying pods, causing an outage until we restart the workers. Restarting our the worker deployment manually every 2 days seems to be the only mitigation right now which is not so convenient 😞 For more context see: + [Slack thread](https://airbytehq.slack.com/archives/C021JANJ6TY/p1677680873304649) + [discuss.airbyte.io/t/running-too-many-syncs-concurrently-causing-non-explicit-failures](https://discuss.airbyte.io/t/running-too-many-syncs-concurrently-causing-non-explicit-failures/3856) ## Expected Behavior The Airbyte Worker's KubePortManager should not retain ports from jobs that have been completed in the past. It should provide more visibility into port management operations on K8s and clean-up/ allocate free ports to new jobs more efficiently to avoid platform outages. ## Logs ```log ... 2023-03-20 02:45:03 [32mINFO[m i.a.c.EnvConfigs(getEnvOrDefault):1222 - Using default value for environment variable FEATURE_FLAG_CLIENT: '' 2023-03-20 02:45:03 [32mINFO[m i.a.w.p.KubeProcessFactory(create):103 - Attempting to start pod = destination-bigquery-write-8473-0-qxsjv for airbyte/destination-bigquery:1.2.16 with resources io.airbyte.config.ResourceRequirements@6a03f7fa[cpuRequest=125m,cpuLimit=1000m,memoryRequest=128Mi,memoryLimit=2048Mi] and allowedHosts null 2023-03-20 02:55:04 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):144 - Cloud storage job log path: /workspace/8473/0/logs.log 2023-03-20 02:55:08 [42mnormalization[0m > Running: transform-config --config destination_config.json --integration-type bigquery --out /config 2023-03-20 02:55:03 [1;31mERROR[m i.a.w.g.DefaultReplicationWorker(replicate):280 - Sync worker failed. io.airbyte.workers.exception.WorkerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "io.airbyte.workers.process.KubePortManagerSingleton.take()" is null at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:148) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.process.AirbyteIntegrationLauncher.write(AirbyteIntegrationLauncher.java:207) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.internal.DefaultAirbyteDestination.start(DefaultAirbyteDestination.java:88) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.general.DefaultReplicationWorker.replicate(DefaultReplicationWorker.java:216) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.general.DefaultReplicationWorker.run(DefaultReplicationWorker.java:190) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.general.DefaultReplicationWorker.run(DefaultReplicationWorker.java:94) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$6(TemporalAttemptExecution.java:202) ~[io.airbyte-airbyte-workers-0.41.0.jar:?] at java.lang.Thread.run(Thread.java:1589) ~[?:?] Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "io.airbyte.workers.process.KubePortManagerSingleton.take()" is null at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:106) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] ... 7 more 2023-03-20 02:55:03 [32mINFO[m i.a.w.g.DefaultReplicationWorker(prepStateForLaterSaving):607 - Source did not output any state messages ... 2023-03-20 02:55:50 [32mINFO[m i.a.w.p.KubeProcessFactory(create):103 - Attempting to start pod = destination-bigquery-check-8473-1-ahdsi for airbyte/destination-bigquery:1.2.16 with resources io.airbyte.config.ResourceRequirements@58c4f87b[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=] and allowedHosts null 2023-03-20 02:58:20 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):144 - Cloud storage job log path: /workspace/8473/1/logs.log 2023-03-20 02:58:00 [32mINFO[m i.a.w.t.TemporalAttemptExecution(lambda$getCancellationChecker$7):238 - Running sync worker cancellation... 2023-03-20 02:58:00 [32mINFO[m i.a.w.t.TemporalAttemptExecution(lambda$getCancellationChecker$7):242 - Interrupting worker thread... 2023-03-20 02:58:00 [32mINFO[m i.a.w.t.TemporalAttemptExecution(lambda$getCancellationChecker$7):245 - Cancelling completable future... 2023-03-20 02:58:00 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):169 - Stopping cancellation check scheduling... 2023-03-20 02:58:00 [33mWARN[m i.a.c.t.CancellationHandler$TemporalCancellationHandler(checkAndHandleCancellation):60 - Job either timed out or was cancelled. 2023-03-20 02:58:00 [1;31mERROR[m i.a.w.g.DefaultCheckConnectionWorker(run):130 - Unexpected error while checking connection: io.airbyte.workers.exception.WorkerException: null at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:148) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:106) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:74) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$6(TemporalAttemptExecution.java:202) ~[io.airbyte-airbyte-workers-0.41.0.jar:?] at java.lang.Thread.run(Thread.java:1589) ~[?:?] Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1681) ~[?:?] at java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:515) ~[?:?] at java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:677) ~[?:?] at io.airbyte.workers.process.KubePortManagerSingleton.take(KubePortManagerSingleton.java:67) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:106) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] ... 5 more 2023-03-20 02:58:00 [33mWARN[m i.t.i.a.ActivityTaskExecutors$BaseActivityTaskExecutor(execute):114 - Activity failure. ActivityId=7b11f3de-94cb-395e-8275-89c239a85bfa, activityType=RunWithJobOutput, attempt=1 java.util.concurrent.CancellationException: null at java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510) ~[?:?] at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getCancellationChecker$7(TemporalAttemptExecution.java:248) ~[io.airbyte-airbyte-workers-0.41.0.jar:?] at io.airbyte.commons.temporal.CancellationHandler$TemporalCancellationHandler.checkAndHandleCancellation(CancellationHandler.java:59) ~[io.airbyte-airbyte-commons-temporal-0.41.0.jar:?] at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getCancellationChecker$8(TemporalAttemptExecution.java:251) ~[io.airbyte-airbyte-workers-0.41.0.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) ~[?:?] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358) ~[?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] at java.lang.Thread.run(Thread.java:1589) ~[?:?] 2023-03-20 02:58:00 [32mINFO[m i.a.w.t.TemporalAttemptExecution(lambda$getWorkerThread$6):205 - Completing future exceptionally... io.airbyte.workers.exception.WorkerException: Unexpected error while getting checking connection. at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:132) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$6(TemporalAttemptExecution.java:202) ~[io.airbyte-airbyte-workers-0.41.0.jar:?] at java.lang.Thread.run(Thread.java:1589) ~[?:?] Caused by: io.airbyte.workers.exception.WorkerException at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:148) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:106) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:74) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] ... 3 more Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1681) ~[?:?] at java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:515) ~[?:?] at java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:677) ~[?:?] at io.airbyte.workers.process.KubePortManagerSingleton.take(KubePortManagerSingleton.java:67) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:106) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:106) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:74) ~[io.airbyte-airbyte-commons-worker-0.41.0.jar:?] ... 3 more 2023-03-20 02:58:00 [32mINFO[m i.a.c.i.LineGobbler(voidCall):149 - 2023-03-20 02:58:00 [32mINFO[m i.a.c.i.LineGobbler(voidCall):149 - ----- END CHECK ----- 2023-03-20 02:58:00 [32mINFO[m i.a.c.i.LineGobbler(voidCall):149 - 2023-03-20 02:58:00 [33mWARN[m i.t.i.w.ActivityWorker$TaskHandlerImpl(logExceptionDuringResultReporting):365 - Failure during reporting of activity result to the server. ActivityId = 7b11f3de-94cb-395e-8275-89c239a85bfa, ActivityType = RunWithJobOutput, WorkflowId=connection_manager_c57af5ba-a4a7-4888-b9aa-c464b8f32315, WorkflowType=ConnectionManagerWorkflow, RunId=7f5e57c6-1c21-427a-b4f0-831dd26043be io.grpc.StatusRuntimeException: NOT_FOUND: invalid activityID or activity already timed out or invoking workflow is completed at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271) ~[grpc-stub-1.52.1.jar:1.52.1] at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252) ~[grpc-stub-1.52.1.jar:1.52.1] at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165) ~[grpc-stub-1.52.1.jar:1.52.1] at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondActivityTaskFailed(WorkflowServiceGrpc.java:3866) ~[temporal-serviceclient-1.17.0.jar:?] at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.lambda$sendReply$1(ActivityWorker.java:320) ~[temporal-sdk-1.17.0.jar:?] at io.temporal.internal.retryer.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:52) ~[temporal-serviceclient-1.17.0.jar:?] at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:67) ~[temporal-serviceclient-1.17.0.jar:?] at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:60) ~[temporal-serviceclient-1.17.0.jar:?] at io.temporal.internal.retryer.GrpcRetryer.retry(GrpcRetryer.java:50) ~[temporal-serviceclient-1.17.0.jar:?] at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.sendReply(ActivityWorker.java:315) ~[temporal-sdk-1.17.0.jar:?] at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:252) ~[temporal-sdk-1.17.0.jar:?] at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:206) ~[temporal-sdk-1.17.0.jar:?] at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:179) ~[temporal-sdk-1.17.0.jar:?] at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93) ~[temporal-sdk-1.17.0.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] at java.lang.Thread.run(Thread.java:1589) ~[?:?] ``` ## Steps to Reproduce 1. Deploy Airbyte on Kubernetes (Container orchestrator) 2. Run 15+ concurrent connections, syncing hourly 3. Wait for 2-3 days and notice the worker seems to reach it's port allocation limit and can no longer spin up job pods (source/destination/normalization) 💥 ## Are you willing to submit a PR?

YES 👍 I have already opened a PR here to add better logging to the KubePortManagerSingleton class as there seems to be very little logging which makes things very difficult to triage.

seanglynn-thrive commented 1 year ago

FYI @olivermeyer @pcorbel @davinchia @marcosmarxm if you have anything to add :)

justenwalker commented 1 year ago

x-post: https://github.com/airbytehq/airbyte-platform/pull/205#issuecomment-1496712268

In short; you probably need to run multiple workers. By default each worker can accommodate 10 concurrent jobs and the default replica count is 1 for workers. If your jobs take a long time; its possible that the ports will not be reclaimed fast enough for new syncs. If you have 15 concurrent connections; you may want to increase your replica count to 2.

olivermeyer commented 1 year ago

x-post: airbytehq/airbyte-platform#205 (comment)

In short; you probably need to run multiple workers. By default each worker can accommodate 10 concurrent jobs and the default replica count is 1 for workers. If your jobs take a long time; its possible that the ports will not be reclaimed fast enough for new syncs. If you have 15 concurrent connections; you may want to increase your replica count to 2.

Unfortunately this doesn't help in our case. We already have 8 worker replicas, so if you're right we should be able to handle up to 80 concurrent connections. In practice Airbyte struggles to handle just 30. Also, none of our jobs take over an hour to run, and they are definitely all finished by the time we trigger them again roughly 12 hours later.

To me the symptoms still point to ports not being reclaimed after a sync ends: everything works just fine for some time after restarting the workers (two days in our case, but I suspect this depends on the number of connections and how often they run), and after that all syncs fail consistently. I don't know how to troubleshoot this further though. Hopefully @seanglynn-thrive's additional logging will shed some light.

seanglynn-thrive commented 1 year ago

x-post: airbytehq/airbyte-platform#205 (comment) In short; you probably need to run multiple workers. By default each worker can accommodate 10 concurrent jobs and the default replica count is 1 for workers. If your jobs take a long time; its possible that the ports will not be reclaimed fast enough for new syncs. If you have 15 concurrent connections; you may want to increase your replica count to 2.

Unfortunately this doesn't help in our case. We already have 8 worker replicas, so if you're right we should be able to handle up to 80 concurrent connections. In practice Airbyte struggles to handle just 30. Also, none of our jobs take over an hour to run, and they are definitely all finished by the time we trigger them again roughly 12 hours later.

To me the symptoms still point to ports not being reclaimed after a sync ends: everything works just fine for some time after restarting the workers (two days in our case, but I suspect this depends on the number of connections and how often they run), and after that all syncs fail consistently. I don't know how to troubleshoot this further though. Hopefully @seanglynn-thrive's additional logging will shed some light.

To add to this: We also have a similar setup with 5 worker replicas + 15 connections running at different times every hour to avoid minimal sync concurrency (Connection A runs at 0 minutes on the hour, Connection B runs at 10 minutes on the hour etc.). Each job takes 1-6 minutes to complete. We have even performed some stress tests, where we executed all jobs at once, which caused no issues and returned 0 failures.

Initially, we had a single bulky worker (High memory/cpu allocations) doing all of the heavy lifting but we then started to notice this issue occur every 24 hours or so causing an outage. We then scaled out our workers to 3 replicas which prevented the issue from occurring as frequently (Every 48 - 72 hours).

We scaled to 5 replicas, which improved the situation by delaying this issue into occurring every 4-5 days.

From our experience: scaling out the workers seems to delay this exception from occurring but does not resolve the issue.

So if we can all agree that the issue lies within the KubePortManager's port allocation, I think we can work together to narrow it down.

QS 1: Is it possible that the KubePortManager holds on to ports that were allocated to a job at some point in the past? There are some stale job pods that exist within the k8s namespace that did not complete or reach a healthy state (e.g: Error / Init:Error). Could the KubePortManager be retaining old ports for failed/incomplete jobs which accumulate over time? Example:

NAME                                                             READY   STATUS       RESTARTS   AGE
source-postgres-read-13100-0-alsyl                               0/4     Init:Error   0          29h
source-postgres-read-13264-0-xncxi                               0/4     Init:Error   0          15h47m
source-postgres-read-13283-0-lgfkw                               0/4     Init:Error   0          14h22m

QS 2: Is there a connection between the KubePortManager class (within the Worker) and the PodSweeper that keeps both in sync with each other? For example, if the pod sweeper deletes old pods at the kubernetes level, is this change reflected in the KubePortManager?

seanglynn-thrive commented 1 year ago

Another approach we tried to avoid this issue was to increase the number of TEMPORAL_WORKER_PORTS from 40-80. This did not give us the results we expected :(

Is it possible/recommended to significantly increase the number of ports available under this configuration?

justenwalker commented 1 year ago

I put together https://github.com/airbytehq/airbyte-platform/pull/217 in an attempt to try and fix this issue.

I believe the problem is actually during Pod creation. If the init container fails, the ports are never reclaimed because this all happens in the constructor. This may lead to port exhaustion like we are experiencing here.

olivermeyer commented 1 year ago

We upgraded after the PR above was merged + released. At first it seemed to have fixed the issue as we went almost three full days with no issues, but we just started getting the same errors in our syncs:

2023-05-18 09:44:57 ERROR i.a.w.g.DefaultReplicationWorker(replicate):279 - Sync worker failed.
io.airbyte.workers.exception.WorkerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "io.airbyte.workers.process.KubePortManagerSingleton.take()" is null

So it looks like the PR helped but didn't fix the issue entirely.

bflammers commented 1 year ago

We also encountered this issue a couple times now (on v0.40.22). Restarting the workers helps, but it's only a temporary solution. We would like to upgrade and have been waiting for a version where a fix for this issue has been included

@olivermeyer are you still running into the issue or did you find a way to fix it?

olivermeyer commented 1 year ago

We also encountered this issue a couple times now (on v0.40.22). Restarting the workers helps, but it's only a temporary solution. We would like to upgrade and have been waiting for a version where a fix for this issue has been included

@olivermeyer are you still running into the issue or did you find a way to fix it?

Upgrading the chart to v0.45.35 fixed the issue for us.

marcosmarxm commented 1 year ago

@benmoriceau I think there was a PR to fix it right? Can you link the work and close the issue?

k0t3n commented 1 year ago

On v0.45.0 char version the issue is also not reproducing for a while, thank you.

seanglynn-thrive commented 1 year ago

Issue resolved since Airbyte: v0.45.0 🚀

airbytehq / airbyte

Airbyte (Container Orchestrator) Intermittent WorkerException: `KubePortManagerSingleton.take() is null` causes outage💥 #24277