Closed erikamov closed 1 month ago
Warehouse report 📦
Legend (in order of precedence)
Resource type | Indicator | Resolution |
---|---|---|
Large table-materialized model | Orange | Make the model incremental |
Large model without partitioning or clustering | Orange | Add partitioning and/or clustering |
View with more than one child | Yellow | Materialize as a table or incremental |
Incremental | Light green | |
Table | Green | |
View | White |
Description
This PR optimizes
fct_daily_rt_feed_validation_notices
for issue #3481.The structure of the query: 1) Get all daily feeds (a record of retrievals per URL and day) 2) For each URL/day, create a row for each error code (there are 57) 3) For each URL/day/code, aggregate the validation runs, aggregate the warning/error messages, and calculate some metrics After pairing with @ohrite we believe the issue is the location of the CROSS JOIN, and are investigating moving the cross join out to a CTE with the daily feeds table, resulting in a query that finishes at about the 50 minute mark.
After removing the join to notices, the query finishes at the 20 minute mark. Re-adding the join to notices bumps up to just under 40 minutes. Our ongoing investigation will be into speeding up the notice occurrence list aggregation.
Type of change
How has this been tested?
It was tested locally generating the table on staging
cal-itp-data-infra-staging.erika_mart_gtfs_quality.fct_daily_rt_feed_validation_notices
Post-merge follow-ups
Monitoring DAGs
transform_warehouse
andtransform_warehouse_full_refresh_sunday
to see if there are no more errors happening when buildingfct_daily_rt_feed_validation_notices
.