Closed thekaveman closed 1 week ago
Warehouse report 📦
Legend (in order of precedence)
Resource type | Indicator | Resolution |
---|---|---|
Large table-materialized model | Orange | Make the model incremental |
Large model without partitioning or clustering | Orange | Add partitioning and/or clustering |
View with more than one child | Yellow | Materialize as a table or incremental |
Incremental | Light green | |
Table | Green | |
View | White |
@thekaveman Just curious, is it expected that the storage size of fct_benefits_events
is still 7.8 GB? (at least according to https://github.com/cal-itp/data-infra/pull/3547#issuecomment-2474936523...)
@thekaveman Just curious, is it expected that the storage size of
fct_benefits_events
is still 7.8 GB? (at least according to #3547 (comment)...)
Yeah I have no idea what that means / represents. I guess I thought it would go down too... but :shrug:
Description
TLDR; we can filter out about 26.5 million records (of roughly 27 million total!) from the raw Amplitude data that we don't need in the final warehouse fact table / model.
Full details in Slack thread: https://cal-itp.slack.com/archives/C037Y3UE71P/p1731533304019569
Type of change
How has this been tested?
Before this change
Note:
CREATE TABLE (26.9m rows, 73.1 GiB processed) in 14.09s
With this change
Note:
CREATE TABLE (402.1k rows, 73.1 GiB processed) in 15.43s
Post-merge follow-ups
Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.