airbytehq / write-for-the-community

Contribute and collaborate on educational content for the Airbyte Community.
MIT License
41 stars 8 forks source link

Show how to upsert deduplicated data into destination final normalized table #118

Open ChristopheDuong opened 2 years ago

ChristopheDuong commented 2 years ago

Request Details Instead of implementing https://github.com/airbytehq/airbyte/issues/3487, we could write a publication on how to do it using custom transformations.

The idea here is contrary to https://github.com/airbytehq/write-for-the-community/issues/108 where an engineer would want only the latest snapshot of the data and does not care about tracking the different versions of history from slowly changing dimensions.

Since we implemented incremental dbt transformations in normalization, by specifying unique_key attributes (in custom dbt project instead) on final models, we could convert standard models from append sync modes into a new upsert sync mode behavior instead.

arimbr commented 2 years ago

If we decide to implement this new sync mode in Airbyte, this new sync mode may be called: INCREMENTAL UPSERT / INCREMENTAL OVERWRITE.

To implement this mode we could start from the dbt code generated by sync mode INCREMENTAL APPEND and add the dbt unique_key config, and then delete the raw data table.

Alternative titles: