airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.88k stars 4.07k forks source link

Add append_dedupe feature for the MySQL destination #14822

Open persunde opened 2 years ago

persunde commented 2 years ago

Environment

Current Behavior

"Incremental Sync - Deduped History" does NOT show up in the "Set up connection" page if I try to setup a connection

image

Where is "Incremental Sync - Deduped History"?

But "Incremental Sync - Deduped History" is selectable if you do Postgres to Postgres or MySQL to Postgres.

Expected Behavior

Expected "Incremental Sync - Deduped History" to be one of the "Sync Mode" options in the dropdown menus for each table I want to sync.

Logs

I can maybe give logs later.

Steps to Reproduce

  1. Create Postgres Source
  2. Create MySQL Destination
  3. Create a new Connection with Postgres as source, and MySQL as destination
  4. Try to change the "Sync mode" for the tables. "Incremental Sync - Deduped History" does NOT show up as a possible option. Only "Full Refresh Overwrite", "Full Refresh Append" and "Incremental Append" shows up as possible options.

Are you willing to submit a PR?

Maybe for another issue in the future.

See here for Slack thread: https://airbytehq.slack.com/archives/C021JANJ6TY/p1657876134958799

octavia-squidington-iii commented 2 years ago

cc @airbytehq/frontend

natalyjazzviolin commented 2 years ago

Hello @persunde, thanks for submitting this issue! The logs would definitely be helpful for debugging this, could you please add them?

natalyjazzviolin commented 2 years ago

@persunde , just got a note from @tealjulia: the documentation is wrong and unfortunately MySQL doesn't support the Incremental Sync - Deduped History sync mode.

persunde commented 2 years ago

@natalyjazzviolin is there any plans to add "Incremental Sync - Deduped History" sync mode to MySQL? Or only overwrite rows where the ID is the same between source and destination.

We want to transfer data from source to destination, but the source will delete old data, but we want to keep the old data in the destination. I am not sure what I ask for is covered here or not:

In the future We will consider making other flavors of full refresh configurable as first-class citizens in Airbyte. e.g. On new data, copy old data to a new table with a timestamp, and then replace the original table with the new data. As always, we will focus on adding these options in such a way that the behavior of each connector is both well documented and predictable.

https://docs.airbyte.com/understanding-airbyte/connections/full-refresh-overwrite#in-the-future

megan-starr9 commented 1 year ago

I would also love to see this added - it's a hugely common use case in mysql and I wouldn't think it would be much different than pgsql from a relational database standpoint? Seeing as our tables have cursor fields and identities.

Commenting to watch this mainly, and as a reference. Thanks!

bleonard commented 1 year ago

Re-opening this to investigate the behavior.

bleonard commented 1 year ago

@grishick This is a feature request for the append_dedupe feature for the MySQL destination.

icgoco commented 1 year ago

Any news on this? I am trying to synchronize facebook insights with mysql, but it is not Incremental - Append + Deduped. Any way to fix it?

andreadna commented 7 months ago

Hi, any news? in version 0.50.54 OSS it's still not visible!