airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.19k stars 4.14k forks source link

Running the launcher replication-orchestrator failed, `WorkerException` #39512

Open jeffsdata opened 5 months ago

jeffsdata commented 5 months ago

Helm Chart Version

Chart Version: 0.163.0 App Version: 0.63.0

What step the error happened?

During the Sync

Relevant information

We just installed Airbyte into a new environment. I am unable to successfully run any syncs - I've set up three different connections and none of them work.

A warning comes almost immediately: Warning from replication: Something went wrong during replication Copy text message='io.temporal.serviceclient.CheckedExceptionWrapper: io.airbyte.workers.exception.WorkerException: Running the launcher replication-orchestrator failed', type='java.lang.RuntimeException', nonRetryable=false

Sometimes, seemingly randomly, the sync is able to get data (and it'll say "1,000,000 records extracted | 700,000 records loaded") but it never actually creates the final tables. However, most of the attempts have no logs at all. Also just a note... this error is an "ERROR" in an old environment (app version 0.55.0) and a warning in this new environment (app version 0.63.0) - but both environments have this error.

I read some of the past commentary about this, and I think we're going to try setting CONTAINER_ORCHESTRATOR_ENABLED=false. Just figured I'd record this problem.

Relevant log output

In my most recent tests (after I restarted everything), there are no logs, but here's an old run where it also happened. 

2024-06-14 19:46:33 INFO i.m.r.Micronaut(start):100 - Startup completed in 4829ms. Server Running: http://orchestrator-repl-job-3-attempt-5:9000
2024-06-14 19:46:05 platform > Cloud storage job log path: /workspace/3/5/logs.log
2024-06-14 19:46:05 platform > Executing worker wrapper. Airbyte version: 0.63.0
2024-06-14 19:46:05 platform > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-06-14 19:46:05 platform > 
2024-06-14 19:46:05 platform > Using default value for environment variable SOCAT_KUBE_CPU_LIMIT: '2.0'
2024-06-14 19:46:05 platform > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-06-14 19:46:05 platform > ----- START CHECK -----
2024-06-14 19:46:05 platform > Using default value for environment variable SOCAT_KUBE_CPU_REQUEST: '0.1'
2024-06-14 19:46:05 platform > 
2024-06-14 19:46:05 platform > Attempting to start pod = source-google-search-console-check-3-5-geepv for airbyte/source-google-search-console:1.4.4 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@45073f8f[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@6fe4538e[cpuRequest=0.1,cpuLimit=2.0,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@392c3d9f[cpuRequest=0.25,cpuLimit=2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=io.airbyte.config.ResourceRequirements@2327767f[cpuRequest=0.1,cpuLimit=2.0,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdOut=io.airbyte.config.ResourceRequirements@2327767f[cpuRequest=0.1,cpuLimit=2.0,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts io.airbyte.config.AllowedHosts@cd200a3[hosts=[*.googleapis.com, *.datadoghq.com, *.datadoghq.eu, *.sentry.io],additionalProperties={}]
2024-06-14 19:46:05 platform > source-google-search-console-check-3-5-geepv stdoutLocalPort = 9028
2024-06-14 19:46:05 platform > source-google-search-console-check-3-5-geepv stderrLocalPort = 9029
2024-06-14 19:46:05 platform > Creating stdout socket server...
2024-06-14 19:46:05 platform > Creating stderr socket server...
2024-06-14 19:46:05 platform > Creating pod source-google-search-console-check-3-5-geepv...
2024-06-14 19:46:05 platform > Waiting for init container to be ready before copying files...
2024-06-14 19:46:06 platform > Init container ready..
2024-06-14 19:46:06 platform > Copying files...
2024-06-14 19:46:06 platform > Uploading file: source_config.json
2024-06-14 19:46:06 platform > kubectl cp /tmp/bd02ef4a-aed8-4a42-8841-954641a056ef/source_config.json airbyte/source-google-search-console-check-3-5-geepv:/config/source_config.json -c init --retries=3
2024-06-14 19:46:06 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:07 platform > kubectl cp complete, closing process
2024-06-14 19:46:07 platform > Uploading file: FINISHED_UPLOADING
2024-06-14 19:46:07 platform > kubectl cp /tmp/b49e03a3-13b5-491f-97f6-bab3f37cf9ad/FINISHED_UPLOADING airbyte/source-google-search-console-check-3-5-geepv:/config/FINISHED_UPLOADING -c init --retries=3
2024-06-14 19:46:07 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:07 platform > kubectl cp complete, closing process
2024-06-14 19:46:07 platform > Waiting until pod is ready...
2024-06-14 19:46:08 platform > Setting stdout...
2024-06-14 19:46:08 platform > Setting stderr...
2024-06-14 19:46:09 platform > Reading pod IP...
2024-06-14 19:46:09 platform > Pod IP: 172.17.20.251
2024-06-14 19:46:09 platform > Using null stdin output stream...
2024-06-14 19:46:09 platform > Reading messages from protocol version 0.2.0
2024-06-14 19:46:10 platform > Check succeeded
2024-06-14 19:46:11 platform > (pod: airbyte / source-google-search-console-check-3-5-geepv) - Closed all resources for pod
2024-06-14 19:46:11 platform > Check connection job received output: io.airbyte.config.StandardCheckConnectionOutput@43313f83[status=succeeded,message=<null>,additionalProperties={}]
2024-06-14 19:46:11 platform > 
2024-06-14 19:46:11 platform > ----- END CHECK -----
2024-06-14 19:46:11 platform > 
2024-06-14 19:46:11 platform > Cloud storage job log path: /workspace/3/5/logs.log
2024-06-14 19:46:11 platform > Executing worker wrapper. Airbyte version: 0.63.0
2024-06-14 19:46:11 platform > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-06-14 19:46:11 platform > Using default value for environment variable SOCAT_KUBE_CPU_LIMIT: '2.0'
2024-06-14 19:46:11 platform > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-06-14 19:46:11 platform > Using default value for environment variable SOCAT_KUBE_CPU_REQUEST: '0.1'
2024-06-14 19:46:11 platform > Attempting to start pod = destination-mssql-check-3-5-tipxu for airbyte/destination-mssql:1.0.0 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@6ca0af67[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@6fe4538e[cpuRequest=0.1,cpuLimit=2.0,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@392c3d9f[cpuRequest=0.25,cpuLimit=2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=io.airbyte.config.ResourceRequirements@2327767f[cpuRequest=0.1,cpuLimit=2.0,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdOut=io.airbyte.config.ResourceRequirements@2327767f[cpuRequest=0.1,cpuLimit=2.0,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts null
2024-06-14 19:46:11 platform > destination-mssql-check-3-5-tipxu stdoutLocalPort = 9034
2024-06-14 19:46:11 platform > destination-mssql-check-3-5-tipxu stderrLocalPort = 9035
2024-06-14 19:46:11 platform > Creating stdout socket server...
2024-06-14 19:46:11 platform > 
2024-06-14 19:46:11 platform > ----- START CHECK -----
2024-06-14 19:46:11 platform > 
2024-06-14 19:46:11 platform > Creating pod destination-mssql-check-3-5-tipxu...
2024-06-14 19:46:11 platform > Creating stderr socket server...
2024-06-14 19:46:11 platform > Waiting for init container to be ready before copying files...
2024-06-14 19:46:12 platform > Init container ready..
2024-06-14 19:46:12 platform > Copying files...
2024-06-14 19:46:12 platform > Uploading file: source_config.json
2024-06-14 19:46:12 platform > kubectl cp /tmp/f7dfefcb-45fb-4a4d-b6bb-627a957b3da7/source_config.json airbyte/destination-mssql-check-3-5-tipxu:/config/source_config.json -c init --retries=3
2024-06-14 19:46:12 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:12 platform > kubectl cp complete, closing process
2024-06-14 19:46:12 platform > Uploading file: FINISHED_UPLOADING
2024-06-14 19:46:12 platform > kubectl cp /tmp/ad56f775-0a7b-457f-91c0-9c8e9044de9f/FINISHED_UPLOADING airbyte/destination-mssql-check-3-5-tipxu:/config/FINISHED_UPLOADING -c init --retries=3
2024-06-14 19:46:12 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:13 platform > kubectl cp complete, closing process
2024-06-14 19:46:13 platform > Waiting until pod is ready...
2024-06-14 19:46:14 platform > Setting stdout...
2024-06-14 19:46:14 platform > Setting stderr...
2024-06-14 19:46:15 platform > Reading pod IP...
2024-06-14 19:46:15 platform > Pod IP: 172.17.20.252
2024-06-14 19:46:15 platform > Using null stdin output stream...
2024-06-14 19:46:15 platform > Reading messages from protocol version 0.2.0
2024-06-14 19:46:15 platform > INFO main i.a.i.d.m.MSSQLDestination(main):132 starting destination: class io.airbyte.integrations.destination.mssql.MSSQLDestination
2024-06-14 19:46:15 platform > INFO main i.a.c.i.b.IntegrationCliParser$Companion(parseOptions):146 integration args: {check=null, config=source_config.json}
2024-06-14 19:46:15 platform > INFO main i.a.c.i.b.IntegrationRunner(runInternal):123 Running integration: io.airbyte.cdk.integrations.base.ssh.SshWrappedDestination
2024-06-14 19:46:15 platform > INFO main i.a.c.i.b.IntegrationRunner(runInternal):124 Command: CHECK
2024-06-14 19:46:15 platform > INFO main i.a.c.i.b.IntegrationRunner(runInternal):125 Integration config: IntegrationConfig{command=CHECK, configPath='source_config.json', catalogPath='null', statePath='null'}
2024-06-14 19:46:15 platform > WARN main c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-06-14 19:46:15 platform > WARN main c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-06-14 19:46:15 platform > INFO main i.a.c.i.b.s.SshTunnel$Companion(getInstance):433 Starting connection with method: NO_TUNNEL
2024-06-14 19:46:15 platform > INFO main c.z.h.HikariDataSource(<init>):79 HikariPool-1 - Starting...
2024-06-14 19:46:15 platform > INFO main c.z.h.HikariDataSource(<init>):81 HikariPool-1 - Start completed.
2024-06-14 19:46:16 platform > INFO main c.z.h.HikariDataSource(close):349 HikariPool-1 - Shutdown initiated...
2024-06-14 19:46:16 platform > INFO main c.z.h.HikariDataSource(close):351 HikariPool-1 - Shutdown completed.
2024-06-14 19:46:16 platform > INFO main i.a.c.i.b.IntegrationRunner(runInternal):252 Completed integration: io.airbyte.cdk.integrations.base.ssh.SshWrappedDestination
2024-06-14 19:46:16 platform > INFO main i.a.i.d.m.MSSQLDestination(main):134 completed destination: class io.airbyte.integrations.destination.mssql.MSSQLDestination
2024-06-14 19:46:17 platform > (pod: airbyte / destination-mssql-check-3-5-tipxu) - Closed all resources for pod
2024-06-14 19:46:17 platform > Check connection job received output: io.airbyte.config.StandardCheckConnectionOutput@29e74e4b[status=succeeded,message=<null>,additionalProperties={}]
2024-06-14 19:46:17 platform > 
2024-06-14 19:46:17 platform > ----- END CHECK -----
2024-06-14 19:46:17 platform > 
2024-06-14 19:46:18 platform > Cloud storage job log path: /workspace/3/5/logs.log
2024-06-14 19:46:18 platform > Executing worker wrapper. Airbyte version: 0.63.0
2024-06-14 19:46:18 platform > Creating orchestrator-repl-job-3-attempt-5 for attempt number: 5
2024-06-14 19:46:18 platform > There are currently running pods for the connection: [destination-mssql-write-3-4-cgiqn, orchestrator-repl-job-3-attempt-4, source-google-search-console-read-3-4-asrdf]. Killing these pods to enforce one execution at a time.
2024-06-14 19:46:18 platform > Attempting to delete pods: [destination-mssql-write-3-4-cgiqn, orchestrator-repl-job-3-attempt-4, source-google-search-console-read-3-4-asrdf]
2024-06-14 19:46:18 platform > Waiting for deletion...
2024-06-14 19:46:19 platform > There are currently running pods for the connection: [destination-mssql-write-3-4-cgiqn, orchestrator-repl-job-3-attempt-4, source-google-search-console-read-3-4-asrdf]. Killing these pods to enforce one execution at a time.
2024-06-14 19:46:19 platform > Attempting to delete pods: [destination-mssql-write-3-4-cgiqn, orchestrator-repl-job-3-attempt-4, source-google-search-console-read-3-4-asrdf]
2024-06-14 19:46:19 platform > Waiting for deletion...
2024-06-14 19:46:20 platform > Successfully deleted all running pods for the connection!
2024-06-14 19:46:20 platform > Waiting for pod to be running...
2024-06-14 19:46:21 platform > Pod airbyte/orchestrator-repl-job-3-attempt-5 is running on 172.17.18.156
2024-06-14 19:46:21 platform > Uploading file: envMap.json
2024-06-14 19:46:21 platform > kubectl cp /tmp/ff07a0f9-8ee6-4303-b5f3-0d4cf7496abc/envMap.json airbyte/orchestrator-repl-job-3-attempt-5:/config/envMap.json -c init --retries=3
2024-06-14 19:46:21 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:21 platform > kubectl cp complete, closing process
2024-06-14 19:46:21 platform > Uploading file: application.txt
2024-06-14 19:46:21 platform > kubectl cp /tmp/a2a66806-71b7-4943-a223-be882de5b570/application.txt airbyte/orchestrator-repl-job-3-attempt-5:/config/application.txt -c init --retries=3
2024-06-14 19:46:21 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:22 platform > kubectl cp complete, closing process
2024-06-14 19:46:22 platform > Uploading file: jobRunConfig.json
2024-06-14 19:46:22 platform > kubectl cp /tmp/f3664d88-da4b-46d6-864a-778a8e556848/jobRunConfig.json airbyte/orchestrator-repl-job-3-attempt-5:/config/jobRunConfig.json -c init --retries=3
2024-06-14 19:46:22 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:22 platform > kubectl cp complete, closing process
2024-06-14 19:46:22 platform > Uploading file: destinationLauncherConfig.json
2024-06-14 19:46:22 platform > kubectl cp /tmp/1927f76c-dedc-4c10-af2b-910ad1fbbfc7/destinationLauncherConfig.json airbyte/orchestrator-repl-job-3-attempt-5:/config/destinationLauncherConfig.json -c init --retries=3
2024-06-14 19:46:22 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:23 platform > kubectl cp complete, closing process
2024-06-14 19:46:23 platform > Uploading file: sourceLauncherConfig.json
2024-06-14 19:46:23 platform > kubectl cp /tmp/9337ce68-87b9-4563-8809-653396b4faaf/sourceLauncherConfig.json airbyte/orchestrator-repl-job-3-attempt-5:/config/sourceLauncherConfig.json -c init --retries=3
2024-06-14 19:46:23 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:24 platform > kubectl cp complete, closing process
2024-06-14 19:46:24 platform > Uploading file: input.json
2024-06-14 19:46:24 platform > kubectl cp /tmp/0610d8ed-740b-49fe-a4cb-75216a8856c3/input.json airbyte/orchestrator-repl-job-3-attempt-5:/config/input.json -c init --retries=3
2024-06-14 19:46:24 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:24 platform > kubectl cp complete, closing process
2024-06-14 19:46:24 platform > Uploading file: KUBE_POD_INFO
2024-06-14 19:46:24 platform > kubectl cp /tmp/ea95ce2f-819f-403d-961a-ada10dc767a4/KUBE_POD_INFO airbyte/orchestrator-repl-job-3-attempt-5:/config/KUBE_POD_INFO -c init --retries=3
2024-06-14 19:46:24 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:25 platform > kubectl cp complete, closing process
2024-06-14 19:46:25 platform > Uploading file: FINISHED_UPLOADING
2024-06-14 19:46:25 platform > kubectl cp /tmp/6177124a-26ee-4812-88d1-66235f6a5acc/FINISHED_UPLOADING airbyte/orchestrator-repl-job-3-attempt-5:/config/FINISHED_UPLOADING -c init --retries=3
2024-06-14 19:46:25 platform > Waiting for kubectl cp to complete
2024-06-14 19:46:25 platform > kubectl cp complete, closing process
2024-06-14 19:46:36 replication-orchestrator > Writing async status INITIALIZING for KubePodInfo[namespace=airbyte, name=orchestrator-repl-job-3-attempt-5, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:0.63.0, pullPolicy=IfNotPresent]]...
2024-06-14 19:46:34 INFO i.a.f.ConfigFileClient(<init>):105 - path /flags does not exist, will return default flag values
2024-06-14 19:46:35 WARN i.a.m.l.MetricClientFactory(initialize):72 - MetricClient was not recognized or not provided. Accepted values are `datadog` or `otel`. 
2024-06-14 19:46:37 replication-orchestrator > sourceLauncherConfig is: io.airbyte.persistence.job.models.IntegrationLauncherConfig@297454f7[jobId=3,attemptId=5,connectionId=dd6bda3e-62d0-4e1a-9c17-202805465fa8,workspaceId=05475012-6438-4b97-969f-e4004bdc62c9,dockerImage=airbyte/source-google-search-console:1.4.4,normalizationDockerImage=<null>,supportsDbt=false,normalizationIntegrationType=<null>,protocolVersion=Version{version='0.2.0', major='0', minor='2', patch='0'},isCustomConnector=false,allowedHosts=io.airbyte.config.AllowedHosts@44ec4a38[hosts=[*.googleapis.com, *.datadoghq.com, *.datadoghq.eu, *.sentry.io],additionalProperties={}],additionalEnvironmentVariables=<null>,additionalLabels=<null>,priority=<null>,additionalProperties={}]
2024-06-14 19:46:37 replication-orchestrator > Concurrent stream read enabled? false
2024-06-14 19:46:37 replication-orchestrator > Setting up source...
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-06-14 19:46:37 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-06-14 19:46:37 replication-orchestrator > Setting up destination...
2024-06-14 19:46:37 replication-orchestrator > Setting up replication worker...
2024-06-14 19:46:37 replication-orchestrator > Using BoundedConcurrentLinkedQueue
2024-06-14 19:46:37 replication-orchestrator > Using BoundedConcurrentLinkedQueue
2024-06-14 19:46:38 replication-orchestrator > Running replication worker...
2024-06-14 19:46:38 replication-orchestrator > start sync worker. job id: 3 attempt id: 5
2024-06-14 19:46:38 replication-orchestrator > 
2024-06-14 19:46:38 replication-orchestrator > ----- START REPLICATION -----
2024-06-14 19:46:38 replication-orchestrator > 
2024-06-14 19:46:38 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-06-14 19:46:38 replication-orchestrator > Running destination...
2024-06-14 19:46:38 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-06-14 19:46:38 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-06-14 19:46:38 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-06-14 19:46:38 replication-orchestrator > Attempting to start pod = source-google-search-console-read-3-5-cxekk for airbyte/source-google-search-console:1.4.4 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@143f7ede[cpuRequest=0.2,cpuLimit=1,memoryRequest=1Gi,memoryLimit=2Gi,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@f0abe17[cpuRequest=0.05,cpuLimit=0.2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@6da8d54f[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=null, stdOut=io.airbyte.config.ResourceRequirements@3bf374f0[cpuRequest=0.2,cpuLimit=1,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts io.airbyte.config.AllowedHosts@44ec4a38[hosts=[*.googleapis.com, *.datadoghq.com, *.datadoghq.eu, *.sentry.io],additionalProperties={}]
2024-06-14 19:46:38 replication-orchestrator > Attempting to start pod = destination-mssql-write-3-5-xsoll for airbyte/destination-mssql:1.0.0 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@2b723a40[cpuRequest=0.2,cpuLimit=1,memoryRequest=1Gi,memoryLimit=2Gi,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@f0abe17[cpuRequest=0.05,cpuLimit=0.2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@1c6d8bd3[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=io.airbyte.config.ResourceRequirements@6b536608[cpuRequest=0.1,cpuLimit=1,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdOut=io.airbyte.config.ResourceRequirements@51c9f118[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts null
2024-06-14 19:46:38 replication-orchestrator > source-google-search-console-read-3-5-cxekk stdoutLocalPort = 9877
2024-06-14 19:46:38 replication-orchestrator > destination-mssql-write-3-5-xsoll stdoutLocalPort = 9878
2024-06-14 19:46:38 replication-orchestrator > source-google-search-console-read-3-5-cxekk stderrLocalPort = 9879
2024-06-14 19:46:38 replication-orchestrator > destination-mssql-write-3-5-xsoll stderrLocalPort = 9880
2024-06-14 19:46:38 replication-orchestrator > Creating stdout socket server...
2024-06-14 19:46:38 replication-orchestrator > Creating stdout socket server...
2024-06-14 19:46:38 replication-orchestrator > Creating stderr socket server...
2024-06-14 19:46:38 replication-orchestrator > Creating stderr socket server...
2024-06-14 19:46:38 replication-orchestrator > Creating pod destination-mssql-write-3-5-xsoll...
2024-06-14 19:46:38 replication-orchestrator > Creating pod source-google-search-console-read-3-5-cxekk...
2024-06-14 19:46:39 replication-orchestrator > Waiting for init container to be ready before copying files...
2024-06-14 19:46:39 replication-orchestrator > Waiting for init container to be ready before copying files...
2024-06-14 20:01:40 INFO i.a.a.BlockingShutdownAnalyticsPlugin(waitForFlush):308 - Segment analytic client flush complete.
2024-06-14 20:01:40 INFO i.a.a.SegmentAnalyticsClient(close):244 - Segment analytics client closed.  No new events will be accepted.
2024-06-14 20:01:39 replication-orchestrator > (pod: airbyte / destination-mssql-write-3-5-xsoll) - Destroying Kube process.
2024-06-14 20:01:39 replication-orchestrator > (pod: airbyte / source-google-search-console-read-3-5-cxekk) - Destroying Kube process.
2024-06-14 20:01:39 replication-orchestrator > (pod: airbyte / destination-mssql-write-3-5-xsoll) - Closed all resources for pod
2024-06-14 20:01:39 replication-orchestrator > (pod: airbyte / destination-mssql-write-3-5-xsoll) - Destroyed Kube process.
2024-06-14 20:01:39 replication-orchestrator > (pod: airbyte / source-google-search-console-read-3-5-cxekk) - Closed all resources for pod
2024-06-14 20:01:39 replication-orchestrator > (pod: airbyte / source-google-search-console-read-3-5-cxekk) - Destroyed Kube process.
2024-06-14 20:01:39 replication-orchestrator > thread status... timeout thread: false , replication thread: true
2024-06-14 20:01:39 replication-orchestrator > sync summary: {
  "status" : "failed",
  "startTime" : 1718394398059,
  "endTime" : 1718395299974,
  "totalStats" : {
    "bytesEmitted" : 0,
    "destinationStateMessagesEmitted" : 0,
    "destinationWriteEndTime" : 0,
    "destinationWriteStartTime" : 1718394398142,
    "meanSecondsBeforeSourceStateMessageEmitted" : 0,
    "maxSecondsBeforeSourceStateMessageEmitted" : 0,
    "meanSecondsBetweenStateMessageEmittedandCommitted" : 0,
    "recordsEmitted" : 0,
    "replicationEndTime" : 1718395299968,
    "replicationStartTime" : 1718394398059,
    "sourceReadEndTime" : 0,
    "sourceReadStartTime" : 1718394398142,
    "sourceStateMessagesEmitted" : 0
  },
  "streamStats" : [ ],
  "performanceMetrics" : {
    "processFromSource" : {
      "elapsedTimeInNanos" : 0,
      "executionCount" : 0,
      "avgExecTimeInNanos" : "NaN"
    },
    "readFromSource" : {
      "elapsedTimeInNanos" : 0,
      "executionCount" : 0,
      "avgExecTimeInNanos" : "NaN"
    },
    "processFromDest" : {
      "elapsedTimeInNanos" : 0,
      "executionCount" : 0,
      "avgExecTimeInNanos" : "NaN"
    },
    "writeToDest" : {
      "elapsedTimeInNanos" : 0,
      "executionCount" : 0,
      "avgExecTimeInNanos" : "NaN"
    },
    "readFromDest" : {
      "elapsedTimeInNanos" : 0,
      "executionCount" : 0,
      "avgExecTimeInNanos" : "NaN"
    }
  }
}
2024-06-14 20:01:39 replication-orchestrator > failures: [ {
  "failureOrigin" : "replication",
  "internalMessage" : "io.airbyte.workers.exception.WorkerException: Failed to create pod for write step",
  "externalMessage" : "Something went wrong during replication",
  "metadata" : {
    "attemptNumber" : 5,
    "jobId" : 3
  },
  "stacktrace" : "java.lang.RuntimeException: io.airbyte.workers.exception.WorkerException: Failed to create pod for write step\n\tat io.airbyte.workers.general.ReplicationWorkerHelper.startDestination(ReplicationWorkerHelper.kt:215)\n\tat io.airbyte.workers.general.BufferedReplicationWorker.lambda$run$0(BufferedReplicationWorker.java:170)\n\tat io.airbyte.workers.general.BufferedReplicationWorker.lambda$runAsync$2(BufferedReplicationWorker.java:235)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: io.airbyte.workers.exception.WorkerException: Failed to create pod for write step\n\tat io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:197)\n\tat io.airbyte.workers.process.AirbyteIntegrationLauncher.write(AirbyteIntegrationLauncher.java:265)\n\tat io.airbyte.workers.internal.DefaultAirbyteDestination.start(DefaultAirbyteDestination.java:110)\n\tat io.airbyte.workers.general.ReplicationWorkerHelper.startDestination(ReplicationWorkerHelper.kt:213)\n\t... 6 more\nCaused by: io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [900000] milliseconds for [Pod] with name:[destination-mssql-write-3-5-xsoll] in namespace [airbyte].\n\tat io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:946)\n\tat io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:98)\n\tat io.airbyte.workers.process.KubePodProcess.waitForInitPodToRun(KubePodProcess.java:394)\n\tat io.airbyte.workers.process.KubePodProcess.<init>(KubePodProcess.java:669)\n\tat io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:193)\n\t... 9 more\n",
  "timestamp" : 1718395299910
}, {
  "failureOrigin" : "replication",
  "internalMessage" : "io.airbyte.workers.exception.WorkerException: Failed to create pod for read step",
  "externalMessage" : "Something went wrong during replication",
  "metadata" : {
    "attemptNumber" : 5,
    "jobId" : 3
  },
  "stacktrace" : "java.lang.RuntimeException: io.airbyte.workers.exception.WorkerException: Failed to create pod for read step\n\tat io.airbyte.workers.general.ReplicationWorkerHelper.startSource(ReplicationWorkerHelper.kt:233)\n\tat io.airbyte.workers.general.BufferedReplicationWorker.lambda$run$1(BufferedReplicationWorker.java:171)\n\tat io.airbyte.workers.general.BufferedReplicationWorker.lambda$runAsync$2(BufferedReplicationWorker.java:235)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: io.airbyte.workers.exception.WorkerException: Failed to create pod for read step\n\tat io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:197)\n\tat io.airbyte.workers.process.AirbyteIntegrationLauncher.read(AirbyteIntegrationLauncher.java:227)\n\tat io.airbyte.workers.internal.DefaultAirbyteSource.start(DefaultAirbyteSource.java:93)\n\tat io.airbyte.workers.general.ReplicationWorkerHelper.startSource(ReplicationWorkerHelper.kt:231)\n\t... 6 more\nCaused by: io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [900000] milliseconds for [Pod] with name:[source-google-search-console-read-3-5-cxekk] in namespace [airbyte].\n\tat io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:946)\n\tat io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:98)\n\tat io.airbyte.workers.process.KubePodProcess.waitForInitPodToRun(KubePodProcess.java:394)\n\tat io.airbyte.workers.process.KubePodProcess.<init>(KubePodProcess.java:669)\n\tat io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:193)\n\t... 9 more\n",
  "timestamp" : 1718395299933
} ]
2024-06-14 20:01:39 replication-orchestrator > Returning output...
2024-06-14 20:01:39 replication-orchestrator > 
2024-06-14 20:01:39 replication-orchestrator > ----- END REPLICATION -----
2024-06-14 20:01:39 replication-orchestrator > 
2024-06-14 20:01:40 replication-orchestrator > Writing async status SUCCEEDED for KubePodInfo[namespace=airbyte, name=orchestrator-repl-job-3-attempt-5, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:0.63.0, pullPolicy=IfNotPresent]]...
jeffsdata commented 5 months ago

Tried a different connector - GA4 to CSV - and that didn't change anything.

marcosmarxm commented 5 months ago

@jeffsdata, did you upgrade from a previous version? This error previously occurred when the state folder couldn't find Minio or the configured Log bucket.

jeffsdata commented 5 months ago

This wasn't an upgrade scenario. Another team member created the environment, so I don't know exactly what they did -- but my assumption is he ran a fresh install using the most up-to-date helm chart provided (since that is the app version installed). I'll check with them to find out if they did anything weird, though.

jeffsdata commented 4 months ago

Turns out, this was just a problem with the Helm chart version. They updated to the most recent version, redeployed, and that seems to have fixed this.

jeffsdata commented 3 months ago

This issue has come back for me. I'm working with our internal team, but just wanted to update that it worked for about 3 weeks, and then the last 3 days it's just been throwing the original error -- no logs, just: message='io.temporal.serviceclient.CheckedExceptionWrapper: io.airbyte.workers.exception.WorkerException: Running the launcher replication-orchestrator failed', type='java.lang.RuntimeException', nonRetryable=false

daniel-ro commented 3 months ago

@marcosmarxm same for me, happens with airbyte/source-facebook-marketing v3.3.6 airbyte itself is 0.63.4. the error is:

message='io.temporal.serviceclient.CheckedExceptionWrapper: io.airbyte.workers.exception.WorkerException: Running the launcher replication-orchestrator failed', type='java.lang.RuntimeException', nonRetryable=false
jeffsdata commented 3 months ago

We ended up clearing our log files in "airbyte storage" and that got our jobs running again. It was about half full - we'd allocated 1GB and it had about 475MB worth of logs.

daniel-ro commented 3 months ago

@jeffsdata thanks for updating. unfortunately it didn't work for us

@marcosmarxm any idea what could cause this? might be related to temporal state somehow?

silva-vinicius commented 3 months ago

We're facing the same issue here. We migrated from a previous version (v0.50.35) to the latest (v0.63.14) and now none of our connections work. In addition to the logs posted by @jeffsdata, we're also getting this message:

Caused by: io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [900000] milliseconds for [Pod] with name:[source-declarative-manifest-read-2018-2-bclrt] in namespace [airbyte-abctl].

kosuke-zhang commented 3 months ago

+1

agonbina commented 3 months ago

Same issue here. It starts even on a new install, after any subsequent helm upgrade. Quite embarrassing to be fair 🤦‍♂️

Full Exception:

io.airbyte.workload.launcher.pipeline.stages.model.StageError: io.airbyte.workload.launcher.pods.KubeClientException: Failed to create pod source-slack-check-8-0-icvod. at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:46) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:38) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.$$access$$apply(Unknown Source) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Exec.dispatch(Unknown Source) at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456) at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:129) at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.doIntercept(InstrumentInterceptorBase.kt:61) at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.intercept(InstrumentInterceptorBase.kt:44) at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:138) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.apply(Unknown Source) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:24) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2367) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:193) at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53) at reactor.core.publisher.Mono.subscribe(Mono.java:4552) at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126) at reactor.core.scheduler.ImmediateScheduler$ImmediateSchedulerWorker.schedule(ImmediateScheduler.java:84) at reactor.core.publisher.MonoSubscribeOn.subscribeOrReturn(MonoSubscribeOn.java:55) at reactor.core.publisher.Mono.subscribe(Mono.java:4552) at reactor.core.publisher.Mono.subscribeWith(Mono.java:4634) at reactor.core.publisher.Mono.subscribe(Mono.java:4395) at io.airbyte.workload.launcher.pipeline.LaunchPipeline.accept(LaunchPipeline.kt:50) at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:28) at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:12) at io.airbyte.commons.temporal.queue.QueueActivityImpl.consume(Internal.kt:87) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) at io.temporal.common.interceptors.ActivityInboundCallsInterceptorBase.execute(ActivityInboundCallsInterceptorBase.java:39) at io.temporal.opentracing.internal.OpenTracingActivityInboundCallsInterceptor.execute(OpenTracingActivityInboundCallsInterceptor.java:78) at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583) Caused by: io.airbyte.workload.launcher.pods.KubeClientException: Failed to create pod source-slack-check-8-0-icvod. at io.airbyte.workload.launcher.pods.KubePodClient.launchConnectorWithSidecar(KubePodClient.kt:266) at io.airbyte.workload.launcher.pods.KubePodClient.launchCheck(KubePodClient.kt:207) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:44) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24) at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ... 53 more Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://10.43.0.1:443/api/v1/namespaces/airbyte/pods/source-slack-check-8-0-icvod?fieldManager=fabric8. Message: Unauthorized. Received status: Status(apiVersion=v1, code=401, details=null, kind=Status, message=Unauthorized, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Unauthorized, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:507) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:419) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:397) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handlePatch(BaseOperation.java:764) at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:231) at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:236) at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:251) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:1179) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:98) at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:55) at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:50) at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand$lambda$0(KubePodLauncher.kt:253) at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243) at dev.failsafe.Functions.lambda$get$0(Functions.java:46) at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74) at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187) at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376) at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112) at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand(KubePodLauncher.kt:253) at io.airbyte.workload.launcher.pods.KubePodLauncher.create(KubePodLauncher.kt:50) at io.airbyte.workload.launcher.pods.KubePodClient.launchConnectorWithSidecar(KubePodClient.kt:263) ... 57 more Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://10.43.0.1:443/api/v1/namespaces/airbyte/pods/source-slack-check-8-0-icvod?fieldManager=fabric8. Message: Unauthorized. Received status: Status(apiVersion=v1, code=401, details=null, kind=Status, message=Unauthorized, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Unauthorized, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:660) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:640) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.assertResponseCode(OperationSupport.java:589) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.lambda$handleResponse$0(OperationSupport.java:549) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179) at io.fabric8.kubernetes.client.http.StandardHttpClient.lambda$completeOrCancel$10(StandardHttpClient.java:142) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179) at io.fabric8.kubernetes.client.http.ByteArrayBodyHandler.onBodyDone(ByteArrayBodyHandler.java:51) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179) at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$OkHttpAsyncBody.doConsume(OkHttpClientImpl.java:136) ... 3 more

I see a 401 Authorization issues somewhere in the massive stack trace.

EDIT: the only way I can get things working again is if I do a helm uninstall and then helm install again. This only makes sense if your persistence (postgres and storage/S3) is decoupled and external.

daniel-ro commented 3 months ago

for us the solution was to increase worker replicas to be 2 * number-of-simultaneous-connections as suggested somewhere deep in the docs.