airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.75k stars 4.04k forks source link

Source MySQL: MySQL to ClickHouse sync fails - Incremental | Append or Deduped + history #11087

Open arashlayeghi opened 2 years ago

arashlayeghi commented 2 years ago
## Environment - **Airbyte version**: 0.35.47-alpha - **OS Version / Instance**: AWS EC2 - **Deployment**: Docker - **Source Connector and version**: MySQL (0.5.6) - **Destination Connector and version**: Clickhouse (0.1.4) - **Source name/version**: MySQL 8.0.28 - **Destination name/version**: Clickhouse 22.1.3.7 - **Severity**: Critical - **Step where error happened**: Deploy / Create a MySQL source with either STANDARD or CDC Replication method / Create a new Clickhouse destination / Setup a new Connection between MySQL and ClickHouse with Sync mode of Incremental Deduped + history OR Incremental Append / Choose transformation: Raw data (JSON) / Press Sync now [Slack issue conversation link.](https://airbytehq.slack.com/archives/C01MFR03D5W/p1646834971350699) ## Current Behavior *It is failed with logs-5808.txt for Incremental Deduped + history and with logs-5810.txt for Incremental Append while the Reset data for both are succeeded. It is worth mentioning that with other sync modes: Full Refresh | Overwrite and Full Refresh | Append it is working fine. You can see below the table we are trying to sync with DESCRIBE trips command. ONE IMPORTANT NOTE is that all these happened to one of our instances which is recently upgraded from MySQL 5.7.33 to MySQL 8.0.28. I tested it with another instance with MySQL 5.7.33 and it worked fine at least without any normalization (Raw data (JSON))* [logs-5808.txt](https://github.com/airbytehq/airbyte/files/8239070/logs-5808.txt) [logs-5810.txt](https://github.com/airbytehq/airbyte/files/8239072/logs-5810.txt) ## Expected Behavior *It should sync with Incremental Deduped + history with either STANDARD or CDC Replication method.* ## Logs [logs-5808.txt](https://github.com/airbytehq/airbyte/files/8239070/logs-5808.txt) [logs-5810.txt](https://github.com/airbytehq/airbyte/files/8239072/logs-5810.txt) ![image](https://user-images.githubusercontent.com/22353246/158048459-581f1549-6d74-4c64-8b50-a17262d87e8d.png)
LOG ``` replace this with your long log output here ```
## Steps to Reproduce Deploy / Create a MySQL source with either STANDARD or CDC Replication method / Create a new Clickhouse destination / Setup a new Connection between MySQL and ClickHouse with Sync mode of Incremental Deduped + history OR Incremental Append / Choose transformation: Raw data (JSON) / Press Sync now
prateekmukhedkar commented 1 year ago

@arashlayeghi could you check with the latest MySQL Source Connector version and see if you are still running into this error? Thank you

squreshicc commented 1 year ago

@prateekmukhedkar I have tried just now. The problem persists. I think problem is with the destination clickhouse connector, because data is being fetched correctly from mysql source. It fails when it runs dbt normalization models on destination clickhouse database. For example, tables do not have the column _airbyte_unique_key created in them, which was supposed to be created automatically at the time of the sync. temp

amit-cashify commented 11 months ago

Any update on this? I am also facing this issue.

2023-10-17 06:55:13 normalization > 06:55:13 Completed with 1 error and 0 warnings: 2023-10-17 06:55:13 normalization > 06:55:13
2023-10-17 06:55:13 normalization > 06:55:13 Database Error in model pdd_request_scd (models/generated/airbyte_incremental/scd/airbyte_stage_db/pdd_request_scd.sql) 2023-10-17 06:55:13 normalization > 06:55:13 :HTTPDriver for http://172.16.X.YYY:8123 returned response code 404) 2023-10-17 06:55:13 normalization > 06:55:13 Code: 47. DB::Exception: Missing columns: 'inc_data._ab_cdc_log_file' 'inc_data._ab_cdc_log_pos' 'inc_data._ab_cdc_cursor' while processing query: 'SELECT _airbyte_pdd_request_hashid, status_code, method, pdd_id, inc_data._ab_cdc_log_pos AS

2023-10-17 06:55:16 normalization > 06:55:12.117260 [debug] [Thread-1 (]: Timing info for model.airbyte_utils.pdd_request_scd (execute): 2023-10-17 06:55:12.062759 => 2023-10-17 06:55:12.117175 2023-10-17 06:55:16 normalization > 06:55:12.119386 [debug] [Thread-1 (]: Database Error in model pdd_request_scd (models/generated/airbyte_incremental/scd/airbyte_stage_db/pdd_request_scd.sql) 2023-10-17 06:55:16 normalization > :HTTPDriver for http://172.16.X.YYY:8123 returned response code 404) 2023-10-17 06:55:16 normalization > Code: 47. DB::Exception: Missing columns: 'inc_data._ab_cdc_log_file' 'inc_data._ab_cdc_log_pos' 'inc_data._ab_cdc_cursor' while processing query: 'SELECT _airbyte_pdd_request_hashid, status_code, method, pdd_id, inc_data._ab_cdc_log_pos AS 2023-10-17 06:55:16 normalization > 06:55:12.119979 [error] [Thread-1 (]: 6 of 9 ERROR creating sql incremental model airbyte_stage_db.pdd_request_scd ........................................... [ERROR in 0.13s]

2023-10-17 06:55:16 normalization > 06:55:13.471437 [info ] [MainThread]: Completed with 1 error and 0 warnings: 2023-10-17 06:55:16 normalization > 06:55:13.471743 [info ] [MainThread]: 2023-10-17 06:55:16 normalization > 06:55:13.472043 [error] [MainThread]: Database Error in model pdd_request_scd (models/generated/airbyte_incremental/scd/airbyte_stage_db/pdd_request_scd.sql) 2023-10-17 06:55:16 normalization > 06:55:13.472410 [error] [MainThread]: :HTTPDriver for http://172.16.X.YYY:8123 returned response code 404) 2023-10-17 06:55:16 normalization > 06:55:13.472752 [error] [MainThread]: Code: 47. DB::Exception: Missing columns: 'inc_data._ab_cdc_log_file' 'inc_data._ab_cdc_log_pos' 'inc_data._ab_cdc_cursor' while processing query: 'SELECT _airbyte_pdd_request_hashid, status_code, method, pdd_id, inc_data._ab_cdc_log_pos AS

06:55:12.119979 [error] [Thread-1 (]: 6 of 9 ERROR creating sql incremental model airbyte_stage_db.pdd_request_scd ........................................... [ERROR in 0.13s] 06:55:13.472043 [error] [MainThread]: Database Error in model pdd_request_scd (models/generated/airbyte_incremental/scd/airbyte_stage_db/pdd_request_scd.sql) 06:55:13.472410 [error] [MainThread]: :HTTPDriver for http://172.16.X.YYY:8123 returned response code 404) 06:55:13.472752 [error] [MainThread]: Code: 47. DB::Exception: Missing columns: 'inc_data._ab_cdc_log_file' 'inc_data._ab_cdc_log_pos' 'inc_data._ab_cdc_cursor' while processing query: 'SELECT _airbyte_pdd_request_hashid, status_code, method, pdd_id, inc_data._ab_cdc_log_pos AS,retryable=,timestamp=1697525716373,additionalProperties={}], io.airbyte.config.FailureReason@4fb84519[failureOrigin=normalization,failureType=system_error,internalMessage=[0m06:55:13.472410 [error] [MainThread]: :HTTPDriver for http://172.16.X.YYY:8123 returned response code 404),externalMessage=Normalization failed during the dbt run. This may indicate a problem with the data itself.,metadata=io.airbyte.config.Metadata@c98d1ce[additionalProperties={attemptNumber=5, jobId=100, from_trace_message=true}],stacktrace=AirbyteDbtError: 06:55:12.119979 [error] [Thread-1 (]: 6 of 9 ERROR creating sql incremental model airbyte_stage_db.pdd_request_scd ........................................... [ERROR in 0.13s] 06:55:13.472043 [error] [MainThread]: Database Error in model pdd_request_scd (models/generated/airbyte_incremental/scd/airbyte_stage_db/pdd_request_scd.sql) 06:55:13.472410 [error] [MainThread]: :HTTPDriver for http://172.16.X.YYY:8123 returned response code 404) 06:55:13.472752 [error] [MainThread]: Code: 47. DB::Exception: Missing columns: 'inc_data._ab_cdc_log_file' 'inc_data._ab_cdc_log_pos' 'inc_data._ab_cdc_cursor' while processing query: 'SELECT _airbyte_pdd_request_hashid, status_code, method, pdd_id, inc_data._ab_cdc_log_pos AS,retryable=,timestamp=1697525716373,additionalProperties={}]],additionalProperties={}] 2023-10-17 06:55:16 INFO i.a.c.i.LineGobbler(voidCall):149 - 2023-10-17 06:55:16 INFO i.a.c.i.LineGobbler(voidCall):149 - ----- END DEFAULT NORMALIZATION ----- 2023-10-17 06:55:16 INFO i.a.c.i.LineGobbler(voidCall):149 - 2023-10-17 06:55:16 INFO i.a.w.t.s.a.AppendToAttemptLogActivityImpl(log):56 - Retry State: RetryManager(completeFailureBackoffPolicy=BackoffPolicy(minInterval=PT10S, maxInterval=PT30M, base=3), partialFailureBackoffPolicy=null, successiveCompleteFailureLimit=5, totalCompleteFailureLimit=10, successivePartialFailureLimit=1000, totalPartialFailureLimit=10, successiveCompleteFailures=5, totalCompleteFailures=5, successivePartialFailures=0, totalPartialFailures=1) Backoff before next attempt: 13 minutes 30 seconds