Open jtcohen6 opened 5 months ago
We tested the branch (pip install git+https://github.com/dbt-labs/dbt-bigquery.git@batch-metadata-benchmarking#egg=dbt-bigquery
) and can confirm a fantastic improvement!
1156 sources completed in 35 seconds. I ran it twice more and time was very similar. Current time spans between 4 minutes and 12 minutes due to query volatility - but new solution reduces to 1 query which not only is a massive speed up but a huge reduction in volatility. This is a big win.
The only thing I've not yet fully validated is the accuracy of the reported time from the INFORMATION_SCHEMA table. BQ docs state that "The data in the INFORMATION_SCHEMA.TABLE_STORAGE view is not kept in real time, and updates are typically delayed by a few seconds to a few minutes". We use the source_status:fresher+
selector in our build jobs so the impact of this is we might not catch things that have changed immediately but they'd be picked up by our subsequent run - so potentially a small latency impact but a reasonable price for the improvement.
@jtcohen6 @MichelleArk please can you remind me what is waiting for this feature to be merged?
Is this your first time submitting a feature request?
Describe the feature
Move from running a query per table to performing a batch operation.
Running a single query against
INFORMATION_SCHEMA.TABLE_STORAGE
appears to be the most efficient way to calculate freshness for all sources in a project that do not define a customloaded_at_field
. Evaluate if there is a python client API or if we need to run the query ourselves.Copying from https://github.com/dbt-labs/dbt-bigquery/issues/938#issuecomment-2109370037:
And from https://github.com/dbt-labs/dbt-bigquery/issues/938#issuecomment-2109531126:
Describe alternatives you've considered
Existing implementation: non-batch, leveraging BigQuery API. This is likely preferable in cases where a project contains only a few sources.
Who will this benefit?
BigQuery users with lots of sources, who want to calculate freshness for them all at once
Are you interested in contributing this feature?
No response
Anything else?
Spike: