airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.4k stars 3.97k forks source link

Destination ElasticSearch: add support incremental sync “overwrite” mode #17594

Open marc-marketparts opened 1 year ago

marc-marketparts commented 1 year ago

Tell us about the problem you're trying to solve

We have to index millions of records from Snowflake into an Elasticsearch index. Our expectation is that an update of a record in Snowflake will update the corresponding document in the Elasticsearch index.

This behaviour is currently available in the Elasticsearch destination connector for the fullrefresh-overwrite mode only (if "UPSERT" mode has been activated in the connector settings, the table primary key is used as the document id).

We cannot afford to always do a full refresh of the index as it takes too much time (for our business use case) due to the volumetry. We need to update the index incrementally, but the current types of Airbyte incremental sync are restricted to Append( and Deduped for some connectors), which will produce new documents in the Elasticsearch index, instead of updating the corresponding ones.

The new sync mode “Incremental - overwrite” will handle this use case (insert/update existing records in the destination).

When this mode will be available, it will be very easy to implement it in the Elasticsearch destination connector, as it only requires to pass the primary key as the document id (which is already done for the fullrefresh mode).

Describe the solution you’d like

Allow sync mode incremental "overwrite" for developers, and enable the choice of the primary key in the UI.

N.B/: as written is doc, the mode Overwrite: Overwrite by first deleting existing data in the destination. , this would not always be the case for incremental mode (it can be an update/insert instead of a delete/insert handled by the destination) and the new mode will not sync any deletion in the source as it is incremental. So the definition of "overwrite" have to be updated or a new name have to be found (e.g. "merge"). My flavour to the former.

Describe the alternative you’ve considered or used

Additional context

Are you willing to submit a PR?

No

marcosmarxm commented 1 year ago

I added your request to connector destination backlog, thanks!