dbt-labs / jaffle-shop-template

Template for a DuckDB-based, Codespace-oriented sandbox project that is also dbt Cloud compatible, and includes code-first BI tooling via Evidence.
51 stars 30 forks source link

Feat: Add Extract-Load pipeline using Meltano #9

Closed aaronsteers closed 1 year ago

aaronsteers commented 1 year ago

Changes included in this PR

Sample usage (more examples in the updated README.md):

meltano run tap-jaffle-shop target-duckdb

Or using the equivalent 'el' job name:

meltano run el
aaronsteers commented 1 year ago

@gwenwindflower - I've scaled this way back so the PR now just focuses on EL.

I still have some work to do to make sure env vars and database paths are all aligned. Once CI passes, this should be ready for review though.

aaronsteers commented 1 year ago

CI Pipelines are now succeeding, starting with: https://github.com/dbt-labs/jaffle-shop-template/actions/runs/4558958279/jobs/8042391887

@gwenwindflower - Do you squash PRs as a rule? We definitely would not want to merge commit this PR, since it has so many commits. Happy to squash on my side if helpful.

Also, my latest commit here https://github.com/dbt-labs/jaffle-shop-template/pull/9/commits/90de06597aea6a2b97723bec230f547ebf1f6494 removes the raw data files and the dynamic behavior toggle for external tables, instead assuming the raw data has already landed. Once I removed external table support, I was able to change the default raw schema for the tap to the same as you had as the default elsewhere - now 'jaffle_raw' instead of 'tap_jaffle_shop' I had previously.

I'm happy to revert and bring those back, but I thought I'd just include those deletions here so it is certain that the pipeline data is coming from the EL process and not the seed CSV files.

aaronsteers commented 1 year ago

Pipeline is green again, ready for review. Note that 1 year of data takes approximately 3 minutes and total CI duration is still <5 minutes in total:

image
aaronsteers commented 1 year ago

@gwenwindflower - I found a small performance optimization that will significantly boost tap performance.

I'll bump the tap version once that update is available and I will update the postCreateScript as you proposed.

Will post back here with updated perf numbers when that's done. 👍