The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Create the actual connections between sources/destinations for each benchmark dataset using incremental append syncing. Use the octavia CLI to manage these connections in code. Say we have 3 benchmark datasets: small, medium, large. And two database sources: postgres and mysql. Two data warehouses: snowflake and redshift.
We want these connections created
postgres - small - incremental append
postgres - medium - incremental append
postgres - large - incremental append
mysql - small - full refresh
mysql - medium - full refresh
mysql - large - full refresh
snowflake - small - incremental append
snowflake - medium - incremental append
snowflake - large - incremental append
redshift - small - incremental append
redshift - medium - incremental append
redshift - large - incremental append
Incremental Append is more complicated to set up because there needs to be a way to manage the incremental-ness of the data in a repeatable way. Since its always looking for new nonexistent data we will either have to:
1) Have source databases with dynamically changing data
2) Find a way to hack Airbyte to always think a certain subset of data is new. Maybe if theres a way to update a cursor field somewhere
Create the actual connections between sources/destinations for each benchmark dataset using incremental append syncing. Use the octavia CLI to manage these connections in code. Say we have 3 benchmark datasets: small, medium, large. And two database sources: postgres and mysql. Two data warehouses: snowflake and redshift.
We want these connections created
postgres - small - incremental append postgres - medium - incremental append postgres - large - incremental append mysql - small - full refresh mysql - medium - full refresh mysql - large - full refresh
snowflake - small - incremental append snowflake - medium - incremental append snowflake - large - incremental append redshift - small - incremental append redshift - medium - incremental append redshift - large - incremental append
Incremental Append is more complicated to set up because there needs to be a way to manage the incremental-ness of the data in a repeatable way. Since its always looking for new nonexistent data we will either have to:
1) Have source databases with dynamically changing data 2) Find a way to hack Airbyte to always think a certain subset of data is new. Maybe if theres a way to update a cursor field somewhere
Either way this issue is much more complicated