airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
14.79k stars 3.8k forks source link

Scheduler with Cron don't trigger when needed #39801

Open KleyLimaa opened 2 weeks ago

KleyLimaa commented 2 weeks ago

Helm Chart Version

0.64.308

What step the error happened?

Other

Relevant information

I have some connections that need to run once per day, so I use a cron expression like this 0 7 6 * * ?, but some days the job just don't trigger and theres no failed attemp, so I can't see some clear reason.

Captura de tela de 2024-06-20 09-08-08

In this Job History we can see some days without runs.

Relevant log output

No response

marcosmarxm commented 2 weeks ago

Hello @KleyLimaa do you have other connections or only this one? Maybe one that runs a long-sync? If possible to provide more information about your deployment it can be really helpful to reproduce the issue.

KleyLimaa commented 2 weeks ago

Hi @marcosmarxm, thanks for the reply. I have others connections with similar configuration, one sync per day. This is the longest run that i have in the moment.

Captura de tela de 2024-06-24 09-32-32

The deployment is made on a eks cluster. Just basic customization on the chart's values.

version: 0.64.308 appVersion: 0.58.1

brianstorti commented 1 week ago

I'm seeing a similar issue. I have a connection configured to run hourly, using the this cron expression: 0 0 0/1 * * ? For some reason, it sometimes skips a run.

Screenshot 2024-06-29 at 20 44 06

appVersion: 0.63.1 chart version: 0.199.0

Not sure if it's related to this issue, but I'm seeing this error in the cron pod:

2024-06-29 23:55:34 INFO i.a.c.t.ConnectionManagerUtils(safeTerminateWorkflow):139 - Attempting to terminate existing workflow for workflowId connection_manager_50a1be3e-d793-4aa0-8235-67ed39
04585d.
2024-06-29 23:55:34 WARN i.a.c.t.ConnectionManagerUtils(safeTerminateWorkflow):143 - Could not terminate temporal workflow due to the following error; this may be because there is currently n
o running workflow for this connection.
io.temporal.client.WorkflowNotFoundException: workflowId='connection_manager_50a1be3e-d793-4aa0-8235-67ed3904585d', runId='}
        at io.temporal.client.WorkflowStubImpl.throwAsWorkflowFailureException(WorkflowStubImpl.java:516) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.client.WorkflowStubImpl.terminate(WorkflowStubImpl.java:411) ~[temporal-sdk-1.22.3.jar:?]
        at io.airbyte.commons.temporal.WorkflowClientWrapped.lambda$terminateWorkflow$3(WorkflowClientWrapped.java:83) ~[io.airbyte-airbyte-commons-temporal-core-0.63.1.jar:?]
        at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243) ~[failsafe-3.3.2.jar:3.3.2]
        at dev.failsafe.Functions.lambda$get$0(Functions.java:46) ~[failsafe-3.3.2.jar:3.3.2]
        at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74) ~[failsafe-3.3.2.jar:3.3.2]
        at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187) ~[failsafe-3.3.2.jar:3.3.2]
        at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376) ~[failsafe-3.3.2.jar:3.3.2]
        at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112) ~[failsafe-3.3.2.jar:3.3.2]
        at io.airbyte.commons.temporal.RetryHelper.withRetries(RetryHelper.java:60) ~[io.airbyte-airbyte-commons-temporal-core-0.63.1.jar:?]
        at io.airbyte.commons.temporal.WorkflowClientWrapped.withRetries(WorkflowClientWrapped.java:123) ~[io.airbyte-airbyte-commons-temporal-core-0.63.1.jar:?]
        at io.airbyte.commons.temporal.WorkflowClientWrapped.terminateWorkflow(WorkflowClientWrapped.java:82) ~[io.airbyte-airbyte-commons-temporal-core-0.63.1.jar:?]
        at io.airbyte.commons.temporal.ConnectionManagerUtils.safeTerminateWorkflow(ConnectionManagerUtils.java:141) ~[io.airbyte-airbyte-commons-temporal-0.63.1.jar:?]
        at io.airbyte.commons.temporal.ConnectionManagerUtils.safeTerminateWorkflow(ConnectionManagerUtils.java:158) ~[io.airbyte-airbyte-commons-temporal-0.63.1.jar:?]
        at io.airbyte.commons.temporal.TemporalClient.lambda$restartClosedWorkflowByStatus$0(TemporalClient.java:119) ~[io.airbyte-airbyte-commons-temporal-0.63.1.jar:?]
        at java.base/java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
        at io.airbyte.commons.temporal.TemporalClient.restartClosedWorkflowByStatus(TemporalClient.java:118) ~[io.airbyte-airbyte-commons-temporal-0.63.1.jar:?]
        at io.airbyte.cron.jobs.SelfHealTemporalWorkflows.cleanTemporal(SelfHealTemporalWorkflows.java:42) ~[io.airbyte-airbyte-cron-0.63.1.jar:?]
        at io.airbyte.cron.jobs.$SelfHealTemporalWorkflows$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-cron-0.63.1.jar:?]
        at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456) ~[micronaut-inject-4.4.10.jar:4.4.10]
        at io.micronaut.inject.DelegatingExecutableMethod.invoke(DelegatingExecutableMethod.java:86) ~[micronaut-inject-4.4.10.jar:4.4.10]
        at io.micronaut.context.bind.DefaultExecutableBeanContextBinder$ContextBoundExecutable.invoke(DefaultExecutableBeanContextBinder.java:152) ~[micronaut-inject-4.4.10.jar:4.4.10]
        at io.micronaut.scheduling.processor.ScheduledMethodProcessor.lambda$process$2(ScheduledMethodProcessor.java:131) ~[micronaut-context-4.4.10.jar:4.4.10]                                       at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]                                                                                                    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358) ~[?:?]                                                                                                           at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]                                                             at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]                                                                                            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]                                                                                            at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]                                                                                                                              Caused by: io.grpc.StatusRuntimeException: NOT_FOUND: workflow execution already completed                                                                                                             at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:268) ~[grpc-stub-1.62.2.jar:1.62.2]                                                                                      at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:249) ~[grpc-stub-1.62.2.jar:1.62.2]                                                                                                  at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:167) ~[grpc-stub-1.62.2.jar:1.62.2]                                                                                             at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.terminateWorkflowExecution(WorkflowServiceGrpc.java:4134) ~[temporal-serviceclient-1.22.3.jar:?]         at io.temporal.internal.client.external.GenericWorkflowClientImpl.lambda$terminate$5(GenericWorkflowClientImpl.java:132) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.retryer.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:52) ~[temporal-serviceclient-1.22.3.jar:?]                                                                         at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:69) ~[temporal-serviceclient-1.22.3.jar:?]
        at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:60) ~[temporal-serviceclient-1.22.3.jar:?]
        at io.temporal.internal.retryer.GrpcRetryer.retry(GrpcRetryer.java:50) ~[temporal-serviceclient-1.22.3.jar:?]
        at io.temporal.internal.client.external.GenericWorkflowClientImpl.terminate(GenericWorkflowClientImpl.java:127) ~[temporal-sdk-1.22.3.jar:?]