airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.27k stars 4.15k forks source link

Re-sync in case of stale offset #48578

Closed rodireich closed 2 days ago

rodireich commented 3 days ago

What

With this change mysql will now correctly do a re-sync in case a saved CDC offset is found to be stale, and mysql is configured to go to resync rather than fail.

image

How

CDCPartitionsCreator goes into the following flow:

  1. Round 1: All streams are rolled back to an initial state so the next sync attempt will start over.
  2. Round 2: A transient exception is thrown so platform will kick off another sync attempt.
  3. On the next attempt all streams are rebuilt from scratch.

Review guide

  1. CdcPartitionsCreator - new handling of re-sync flow in 2 rounds.
  2. StateManager - adding a reset streams state to be able to roll back and start over.
  3. MySqlDebeziumOperations - Identify the stale offset + configured to re-sync and trigger the new flow.
vercel[bot] commented 3 days ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment | Name | Status | Preview | Comments | Updated (UTC) | | :--- | :----- | :------ | :------- | :------ | | **airbyte-docs** | ⬜️ Ignored ([Inspect](https://vercel.com/airbyte-growth/airbyte-docs/GKPUUXfhPD2XHNmAdgKfvZ7YNhJi)) | [Visit Preview](https://airbyte-docs-git-10757-mysql-beta-verify-f9db2d-airbyte-growth.vercel.app) | | Nov 21, 2024 4:27pm |
rodireich commented 3 days ago

While verifying this pr I noticed another bug that is not directly related to resync but has to do with how empty tables save their state.

~I'm fixing it but please take a look meantime~

Issue fixed in https://github.com/airbytehq/airbyte/pull/48593 branched out of this PR.