airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.21k stars 4.14k forks source link

Source Redshift: add CDC loading method #12740

Open gauravtanwar03 opened 2 years ago

gauravtanwar03 commented 2 years ago

Tell us about the problem you're trying to solve

AWS Redshift is used as a datawarehouse and for powering reverse etl pipelines. It acts as a source for all of our pipelines. Since we have to send to external platforms using API so we cannot process unnecessary data on daily basis in fullsync as it counts in the API calls and platform usage.

Describe the solution you’d like

Develop a CDC feature for redshift similiar to mysql and postgres. so that changes on a table in redshift can be synced to other operational platforms in an incremental manner.

Describe the alternative you’ve considered or used

We are using filter conditions based on some columns or recently started using data lakehouse like databricks that provides such features in delta lake.

Additional context

Are you willing to submit a PR?

No

grishick commented 2 years ago

@gauravtanwar03

AWS Redshift is used as a datawarehouse and for powering reverse etl pipelines. It acts as a source for all of our pipelines. Since we have to send to external platforms using API so we cannot process unnecessary data on daily basis in fullsync as it counts in the API calls and platform usage.

have you tried using incremental sync mode for Redshift source?