Open fivetran-jamie opened 1 year ago
I have a Fivetran customer who is using our Fivetran Platform models but is seeing large costs in their BQ destination due to Fivetran log processing. They say
"I'm not sure which specific [query] may be responsible but looking at the BigQuery job logs directly I can see these three queries account for around $400 of that. These are the queries that start with the headers of:
/ {"app": "dbt", "dbt_version": "1.3.1", "profile_name": "fivetran", "target_name": "prod", "node_id": "model.fivetran_log.stg_fivetran_log__log"} / / {"app": "dbt", "dbt_version": "1.3.1", "profile_name": "fivetran", "target_name": "prod", "node_id": "model.fivetran_log.fivetran_log__connector_status"} / / {"app": "dbt", "dbt_version": "1.3.1", "profile_name": "fivetran", "target_name": "prod", "node_id": "model.fivetran_log.fivetran_log__schema_changelog"} /
I think primarily this all comes back to the source Fivetran log table (in BigQuery) being > 200 GB in size and the queries in these models reading all that data on every run, rather than perhaps just the past couple of days worth. I see a lot of tables being fully rebuilt every time."
Would this PR address this issue and help reduce the cost of processing the staging tables of the Fivetran Platform package?
@fivetran-joemarkiewicz bumping this issue following our discussion on performance for this package and also based on my work on #126. stg_fivetran_log__log
and the downstreams would make sense to make incremental, though probably not append strategy.
Is there an existing feature request for this?
Describe the Feature
Add logic to
stg_fivetran__log
to make it an incremental model. Since past records should not change at all once they are inserted, an append strategy would work great here and hopefully limit runtime a bunch!Describe alternatives you've considered
log
table can get super big and unwieldyAre you interested in contributing this feature?
Anything else?
No response