cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
48 stars 13 forks source link

Changes report evaluations for MTC from regional to subfeeds #3486

Closed vevetron closed 1 month ago

vevetron commented 1 month ago

Description

In 2023 a decision was made on the reports site to evaluate gtfs checks on MTC sub feed sites like SMART but the checks are on MTC's regional feed.

It's impossible for individual agencies to notice this and SMART has complained that it's not fair for their evaluation to get a fail when it's the regional MTC's gtfs's fault instead.

Resolves #https://github.com/cal-itp/reports/issues/315

Type of change

How has this been tested?

poetry run dbt run --full-refresh --models +fct_daily_organization_combined_guideline_checks

Lots of sql query analysis in staging.

This produces a differential of what is taken out and what stays in: SELECT * FROM `cal-itp-data-infra-staging.vb_staging.int_gtfs_quality__guideline_checks_long` where -- organization_name = 'Sonoma-Marin Area Rail Transit District' AND date = '2024-05-15' AND check = 'No errors in MobilityData GTFS Schedule Validator' AND organization_key IS NOT NULL and public_customer_facing_fixed_route AND key not in (SELECT key FROM `cal-itp-data-infra-staging.vb_staging.int_gtfs_quality__guideline_checks_long` where -- organization_name = 'Sonoma-Marin Area Rail Transit District' AND date = '2024-05-15' AND check = 'No errors in MobilityData GTFS Schedule Validator' AND organization_key IS NOT NULL AND ((use_subfeed_for_reports and public_customer_facing_or_regional_subfeed_fixed_route) or (not use_subfeed_for_reports and public_customer_facing_fixed_route)))

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

Evaluate - This fix makes two agencies disappear from downstream sql tables, but neither of these are on the reports website anyways:

github-actions[bot] commented 1 month ago

Warehouse report 📦

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

vevetron commented 1 month ago

This will cause a lot of reporting to look bad: https://docs.google.com/spreadsheets/d/1BouW5I-FCElAQPoIe84d-o0UQe3-Y4xFIyHRpo2KYLE/edit?usp=sharing