airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.08k stars 4.12k forks source link

BigQuery SCD tables don't reset #15097

Closed isaacharrisholt closed 2 years ago

isaacharrisholt commented 2 years ago
## Environment - **Airbyte version**: 0.39.37-alpha - **OS Version / Instance**: AWS EC2 (Linux AMI) - **Deployment**: Docker - **Destination Connector and version**: BigQuery 1.1.11 - **Step where error happened**: Reset ## Current Behavior When running a reset against our BigQuery tables, the `_scd` tables used for the 'Incremental deduped + history' mode aren't cleared. This means that, when the next sync completes, all the old data is added back to the reset tables, which can lead to dirty data etc. if there have been issues previously. ## Expected Behavior The `_scd` tables should also be reset so that old history is not included in further syncs. ## Steps to Reproduce 1. Run an incremental deduped sync into BigQuery 2. Delete/change the source data 3. Reset the destination tables 4. Sync again 5. Old data will be present in BigQuery ## Are you willing to submit a PR? No. From [this Discourse thread](https://discuss.airbyte.io/t/snowflake-scd-table-not-empty-after-reset/1798), it seems the same happens with Snowflake, so it's possibly a larger problem.
marcosmarxm commented 2 years ago

Duplicate of https://github.com/airbytehq/airbyte/issues/5417