airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.18k stars 4.14k forks source link

Databases and Data warehouses Benchmark: Create connections for Incremental Append #12357

Closed noahkawasaki-airbyte closed 2 years ago

noahkawasaki-airbyte commented 2 years ago

Create the actual connections between sources/destinations for each benchmark dataset using incremental append syncing. Use the octavia CLI to manage these connections in code. Say we have 3 benchmark datasets: small, medium, large. And two database sources: postgres and mysql. Two data warehouses: snowflake and redshift.

We want these connections created

postgres - small - incremental append postgres - medium - incremental append postgres - large - incremental append mysql - small - full refresh mysql - medium - full refresh mysql - large - full refresh

snowflake - small - incremental append snowflake - medium - incremental append snowflake - large - incremental append redshift - small - incremental append redshift - medium - incremental append redshift - large - incremental append

Incremental Append is more complicated to set up because there needs to be a way to manage the incremental-ness of the data in a repeatable way. Since its always looking for new nonexistent data we will either have to:

1) Have source databases with dynamically changing data 2) Find a way to hack Airbyte to always think a certain subset of data is new. Maybe if theres a way to update a cursor field somewhere

Either way this issue is much more complicated

evantahler commented 2 years ago

Closing in favor of https://github.com/airbytehq/airbyte/issues/15152