airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.37k stars 4.16k forks source link

Source MSSQL CDC connector hangs for long time and retry [blocker] #13595

Closed sivankumar86 closed 2 years ago

sivankumar86 commented 2 years ago

Environment Airbyte version: 0.36.9 OS Version / Instance: AWS EC2 Deployment: Kubernetes deploy env Source Connector and version: source-mssql 0.4.0 Step where error happened: Sync job I enabled CDC for mssql DB and scheduled for every 15 minutes however, sometime, job stuck for more than 2 hours and retry the attempt . it usually runs for couple of minutes.

From logs: 2022-06-07 01:45:16 destination > 2022-06-07 01:45:16 INFO i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):65 - Airbyte message consumer: succeeded. 2022-06-07 01:46:12 INFO i.a.v.j.JsonSchemaValidator(test):56 - JSON schema validation failed. errors: .snapshot_isolation: is not defined in the schema and the schema does not allow additional properties, .replication_type: does not have a value in the enumeration [Standard] 2022-06-07 01:46:11 INFO i.a.w.p.KubePodProcess(close):710 - (pod: product-analytics / destination-snowflake-sync-4250-0-wfxrp) - Closed all resources for pod 2022-06-07 01:46:11 ERROR i.a.w.DefaultReplicationWorker(run):169 - Sync worker failed. java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.UncheckedIOException: java.net.SocketException: No route to host at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]

logs-4250.txt

Current Behavior Hang for more time

Expected Behavior Fail fast and retry Logs attached

Steps to Reproduce

It is a blocker as it is taking 3 hours to fail if any issue which is not possible to wait in cdc . Please add fast fail option. logs-4250.txt

sivankumar86 commented 2 years ago
Screen Shot 2022-06-08 at 7 48 50 pm

All the runs took only few minutes but, failed one took 2 hours to retry. I have checked container/worker memory and it was hitting only 20% hence, could be a transient issue but, it should fail fast and retry instead of hanging it.

sivankumar86 commented 2 years ago

Just updating solution here. upgrading airbyte version to 0.39.x solved the hanging issue.