Open jeffsdata opened 5 months ago
Tried a different connector - GA4 to CSV - and that didn't change anything.
@jeffsdata, did you upgrade from a previous version? This error previously occurred when the state
folder couldn't find Minio or the configured Log bucket.
This wasn't an upgrade scenario. Another team member created the environment, so I don't know exactly what they did -- but my assumption is he ran a fresh install using the most up-to-date helm chart provided (since that is the app version installed). I'll check with them to find out if they did anything weird, though.
Turns out, this was just a problem with the Helm chart version. They updated to the most recent version, redeployed, and that seems to have fixed this.
This issue has come back for me. I'm working with our internal team, but just wanted to update that it worked for about 3 weeks, and then the last 3 days it's just been throwing the original error -- no logs, just: message='io.temporal.serviceclient.CheckedExceptionWrapper: io.airbyte.workers.exception.WorkerException: Running the launcher replication-orchestrator failed', type='java.lang.RuntimeException', nonRetryable=false
@marcosmarxm same for me, happens with airbyte/source-facebook-marketing v3.3.6
airbyte itself is 0.63.4
.
the error is:
message='io.temporal.serviceclient.CheckedExceptionWrapper: io.airbyte.workers.exception.WorkerException: Running the launcher replication-orchestrator failed', type='java.lang.RuntimeException', nonRetryable=false
We ended up clearing our log files in "airbyte storage" and that got our jobs running again. It was about half full - we'd allocated 1GB and it had about 475MB worth of logs.
@jeffsdata thanks for updating. unfortunately it didn't work for us
@marcosmarxm any idea what could cause this? might be related to temporal state somehow?
We're facing the same issue here. We migrated from a previous version (v0.50.35) to the latest (v0.63.14) and now none of our connections work. In addition to the logs posted by @jeffsdata, we're also getting this message:
Caused by: io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [900000] milliseconds for [Pod] with name:[source-declarative-manifest-read-2018-2-bclrt] in namespace [airbyte-abctl].
+1
Same issue here. It starts even on a new install, after any subsequent helm upgrade
. Quite embarrassing to be fair 🤦♂️
Full Exception:
io.airbyte.workload.launcher.pipeline.stages.model.StageError: io.airbyte.workload.launcher.pods.KubeClientException: Failed to create pod source-slack-check-8-0-icvod. at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:46) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:38) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.$$access$$apply(Unknown Source) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Exec.dispatch(Unknown Source) at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456) at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:129) at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.doIntercept(InstrumentInterceptorBase.kt:61) at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.intercept(InstrumentInterceptorBase.kt:44) at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:138) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.apply(Unknown Source) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:24) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2367) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:193) at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53) at reactor.core.publisher.Mono.subscribe(Mono.java:4552) at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126) at reactor.core.scheduler.ImmediateScheduler$ImmediateSchedulerWorker.schedule(ImmediateScheduler.java:84) at reactor.core.publisher.MonoSubscribeOn.subscribeOrReturn(MonoSubscribeOn.java:55) at reactor.core.publisher.Mono.subscribe(Mono.java:4552) at reactor.core.publisher.Mono.subscribeWith(Mono.java:4634) at reactor.core.publisher.Mono.subscribe(Mono.java:4395) at io.airbyte.workload.launcher.pipeline.LaunchPipeline.accept(LaunchPipeline.kt:50) at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:28) at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:12) at io.airbyte.commons.temporal.queue.QueueActivityImpl.consume(Internal.kt:87) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) at io.temporal.common.interceptors.ActivityInboundCallsInterceptorBase.execute(ActivityInboundCallsInterceptorBase.java:39) at io.temporal.opentracing.internal.OpenTracingActivityInboundCallsInterceptor.execute(OpenTracingActivityInboundCallsInterceptor.java:78) at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583) Caused by: io.airbyte.workload.launcher.pods.KubeClientException: Failed to create pod source-slack-check-8-0-icvod. at io.airbyte.workload.launcher.pods.KubePodClient.launchConnectorWithSidecar(KubePodClient.kt:266) at io.airbyte.workload.launcher.pods.KubePodClient.launchCheck(KubePodClient.kt:207) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:44) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24) at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ... 53 more Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://10.43.0.1:443/api/v1/namespaces/airbyte/pods/source-slack-check-8-0-icvod?fieldManager=fabric8. Message: Unauthorized. Received status: Status(apiVersion=v1, code=401, details=null, kind=Status, message=Unauthorized, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Unauthorized, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:507) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:419) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:397) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handlePatch(BaseOperation.java:764) at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:231) at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:236) at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:251) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:1179) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:98) at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:55) at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:50) at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand$lambda$0(KubePodLauncher.kt:253) at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243) at dev.failsafe.Functions.lambda$get$0(Functions.java:46) at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74) at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187) at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376) at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112) at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand(KubePodLauncher.kt:253) at io.airbyte.workload.launcher.pods.KubePodLauncher.create(KubePodLauncher.kt:50) at io.airbyte.workload.launcher.pods.KubePodClient.launchConnectorWithSidecar(KubePodClient.kt:263) ... 57 more Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://10.43.0.1:443/api/v1/namespaces/airbyte/pods/source-slack-check-8-0-icvod?fieldManager=fabric8. Message: Unauthorized. Received status: Status(apiVersion=v1, code=401, details=null, kind=Status, message=Unauthorized, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Unauthorized, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:660) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:640) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.assertResponseCode(OperationSupport.java:589) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.lambda$handleResponse$0(OperationSupport.java:549) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179) at io.fabric8.kubernetes.client.http.StandardHttpClient.lambda$completeOrCancel$10(StandardHttpClient.java:142) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179) at io.fabric8.kubernetes.client.http.ByteArrayBodyHandler.onBodyDone(ByteArrayBodyHandler.java:51) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179) at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$OkHttpAsyncBody.doConsume(OkHttpClientImpl.java:136) ... 3 more
I see a 401 Authorization issues somewhere in the massive stack trace.
EDIT: the only way I can get things working again is if I do a helm uninstall
and then helm install
again. This only makes sense if your persistence (postgres and storage/S3) is decoupled and external.
for us the solution was to increase worker replicas to be 2 * number-of-simultaneous-connections
as suggested somewhere deep in the docs.
Helm Chart Version
Chart Version: 0.163.0 App Version: 0.63.0
What step the error happened?
During the Sync
Relevant information
We just installed Airbyte into a new environment. I am unable to successfully run any syncs - I've set up three different connections and none of them work.
A warning comes almost immediately:
Warning from replication: Something went wrong during replication Copy text message='io.temporal.serviceclient.CheckedExceptionWrapper: io.airbyte.workers.exception.WorkerException: Running the launcher replication-orchestrator failed', type='java.lang.RuntimeException', nonRetryable=false
Sometimes, seemingly randomly, the sync is able to get data (and it'll say "1,000,000 records extracted | 700,000 records loaded") but it never actually creates the final tables. However, most of the attempts have no logs at all. Also just a note... this error is an "ERROR" in an old environment (app version 0.55.0) and a warning in this new environment (app version 0.63.0) - but both environments have this error.
I read some of the past commentary about this, and I think we're going to try setting
CONTAINER_ORCHESTRATOR_ENABLED=false
. Just figured I'd record this problem.Relevant log output