airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.94k stars 4.09k forks source link

[Kubeprocess] NPE exception whilst checking pod status #36819

Open sivankumar86 opened 6 months ago

sivankumar86 commented 6 months ago

Topic

K8s Sync NPE exception and it is transient issue. Retry gets success and I am not sure what is actual issue. is it pod creation timeout ?

Relevant information

k8s : Airbyte version 0.55.2

The check connection failed because of an internal error in the scheduler used by airbyte.

Error:

    at io.fabric8.kubernetes.client.informers.impl.cache.SharedProcessor.lambda$distribute$1(SharedProcessor.java:110) ~[kubernetes-client-6.5.1.jar:?]
    at io.fabric8.kubernetes.client.utils.internal.SerialExecutor.lambda$execute$0(SerialExecutor.java:58) ~[kubernetes-client-6.5.1.jar:?]
    at io.fabric8.kubernetes.client.utils.internal.SerialExecutor.scheduleNext(SerialExecutor.java:76) ~[kubernetes-client-6.5.1.jar:?]
    at io.fabric8.kubernetes.client.utils.internal.SerialExecutor.execute(SerialExecutor.java:70) ~[kubernetes-client-6.5.1.jar:?]
    at io.fabric8.kubernetes.client.informers.impl.cache.SharedProcessor.distribute(SharedProcessor.java:107) ~[kubernetes-client-6.5.1.jar:?]
    at io.fabric8.kubernetes.client.informers.impl.cache.ProcessorStore.retainAll(ProcessorStore.java:114) ~[kubernetes-client-6.5.1.jar:?]
    at io.fabric8.kubernetes.client.informers.impl.cache.Reflector.lambda$listSyncAndWatch$3(Reflector.java:120) ~[kubernetes-client-6.5.1.jar:?]
    at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150) ~[?:?]
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179) ~[?:?]
    at io.fabric8.kubernetes.client.http.StandardHttpClient.lambda$completeOrCancel$5(StandardHttpClient.java:120) ~[kubernetes-client-api-6.5.1.jar:?]
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179) ~[?:?]
    at io.fabric8.kubernetes.client.http.ByteArrayBodyHandler.onBodyDone(ByteArrayBodyHandler.java:52) ~[kubernetes-client-api-6.5.1.jar:?]
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179) ~[?:?]
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$OkHttpAsyncBody.doConsume(OkHttpClientImpl.java:135) ~[kubernetes-httpclient-okhttp-6.5.1.jar:?]
    ... 3 more
    Suppressed: java.lang.Throwable: waiting here
            at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:174) ~[kubernetes-client-api-6.5.1.jar:?]
            at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:887) ~[kubernetes-client-6.5.1.jar:?]
            at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:93) ~[kubernetes-client-6.5.1.jar:?]
            at io.airbyte.workers.process.KubePodProcess.waitForInitPodToTerminate(KubePodProcess.java:403) ~[io.airbyte-airbyte-commons-worker-0.55.2.jar:?]
            at io.airbyte.workers.process.KubePodProcess.copyFilesToKubeConfigVolume(KubePodProcess.java:352) ~[io.airbyte-airbyte-commons-worker-0.55.2.jar:?]
            at io.airbyte.workers.process.KubePodProcess.<init>(KubePodProcess.java:660) ~[io.airbyte-airbyte-commons-worker-0.55.2.jar:?]
            at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:193) ~[io.airbyte-airbyte-commons-worker-0.55.2.jar:?]
            at io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:149) ~[io.airbyte-airbyte-commons-worker-0.55.2.jar:?]
            at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:71) ~[io.airbyte-airbyte-commons-worker-0.55.2.jar:?]
            at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.55.2.jar:?]
            at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:142) ~[io.airbyte-airbyte-workers-0.55.2.jar:?]
            at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.lambda$runWithJobOutput$1(CheckConnectionActivityImpl.java:226) ~[io.airbyte-airbyte-workers-0.55.2.jar:?]
            at io.airbyte.commons.temporal.HeartbeatUtils.withBackgroundHeartbeat(HeartbeatUtils.java:57) ~[io.airbyte-airbyte-commons-temporal-core-0.55.2.jar:?]
            at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:211) ~[io.airbyte-airbyte-workers-0.55.2.jar:?]
            at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?]
            at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
            at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.22.3.jar:?]
            at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.22.3.jar:?]
            at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) ~[temporal-sdk-1.22.3.jar:?]
            at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) ~[temporal-sdk-1.22.3.jar:?]
            at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) ~[temporal-sdk-1.22.3.jar:?]
            at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) ~[temporal-sdk-1.22.3.jar:?]
            at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) ~[temporal-sdk-1.22.3.jar:?]
            at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?]
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
            at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]

2024-04-03 20:11:01 platform > 2024-04-03 20:11:01 platform > ----- END CHECK ----- 2024-04-03 20:11:01 platform >

I think it comes from here


https://github.com/airbytehq/airbyte-platform/blob/7c5419626b4c285a4f0a842076e52b6324bfffe8/airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubePodProcess.java#L411
marcosmarxm commented 6 months ago

@sivankumar86 can you update your issue providing what steps are you running. What version? What cloud provider and Kuberentes version. Share the values you're using. This can help understand what is causing the issue.

sivankumar86 commented 6 months ago

@marcosmarxm Thank you for taking a look. It is helm chat and running on Azure aks

Helm chart verions : 0.63.0 k8s 1.26.0

Root causes: I was setting up new sync and destination is "snowflake". It was failing due to snowflake check pod due to permission however, airbyte pod error was misleading with NPE error.

Solution: Provided permission to snowflake airbyte user and issue seems resolved.

It is low priority for now however, it should emit actual error instead of NPE error.

octavia-squidington-iii commented 1 week ago

At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 180 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte.