cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
48 stars 13 forks source link

New mart NTD service models #3517

Closed erikamov closed 3 weeks ago

erikamov commented 4 weeks ago

Description

This PR replaces the existing model dim_annual_ntd_agency_service as requested on ticket #3396 to use the new 2022 external NTD Annual Data.

Following these details sent by @tiffanychu90:

Screenshot 2024-10-28 at 4 09 20 PM

I created these new models, with documentation and basic test:

Type of change

How has this been tested?

The new models and documentation were tested locally an created on staging.

❯ poetry run dbt run -s +"models/mart/ntd/dim_annual_service_agencies.sql"
23:23:04  Running with dbt=1.5.1
23:23:05  Found 422 models, 967 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
23:23:05
23:23:08  Concurrency: 8 threads (target='dev')
23:23:08
23:23:08  1 of 2 START sql view model erika_staging.stg_ntd_annual_data__2022__service_by_agency  [RUN]
23:23:09  1 of 2 OK created sql view model erika_staging.stg_ntd_annual_data__2022__service_by_agency  [CREATE VIEW (0 processed) in 1.04s]
23:23:09  2 of 2 START sql table model erika_mart_ntd.dim_annual_service_agencies ........ [RUN]
23:23:13  2 of 2 OK created sql table model erika_mart_ntd.dim_annual_service_agencies ... [CREATE TABLE (2.2k rows, 7.9 MiB processed) in 3.27s]
23:23:13
23:23:13  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 7.38 seconds (7.38s).
23:23:13
23:23:13  Completed successfully
23:23:13
23:23:13  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

❯ poetry run dbt test -s models/mart/ntd/dim_annual_service_agencies.sql
23:32:57  1 of 3 START test accepted_values_dim_annual_service_agencies_report_year__2022  [RUN]
23:32:57  2 of 3 START test not_null_dim_annual_service_agencies_key ..................... [RUN]
23:32:57  3 of 3 START test unique_dim_annual_service_agencies_key ....................... [RUN]
23:32:58  2 of 3 PASS not_null_dim_annual_service_agencies_key ........................... [PASS in 0.74s]
23:32:58  1 of 3 PASS accepted_values_dim_annual_service_agencies_report_year__2022 ...... [PASS in 0.75s]
23:32:58  3 of 3 PASS unique_dim_annual_service_agencies_key ............................. [PASS in 0.76s]
23:32:58
23:32:58  Finished running 3 tests in 0 hours 0 minutes and 1.90 seconds (1.90s).
23:32:58
23:32:58  Completed successfully
23:32:58
23:32:58  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
❯ poetry run dbt run -s +models/mart/ntd/dim_annual_service_mode_time_periods.sql
23:24:20  Running with dbt=1.5.1
23:24:21  Found 422 models, 967 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
23:24:21
23:24:24  Concurrency: 8 threads (target='dev')
23:24:24
23:24:24  1 of 2 START sql view model erika_staging.stg_ntd_annual_data__2022__service_by_mode_and_time_period  [RUN]
23:24:25  1 of 2 OK created sql view model erika_staging.stg_ntd_annual_data__2022__service_by_mode_and_time_period  [CREATE VIEW (0 processed) in 0.94s]
23:24:25  2 of 2 START sql table model erika_mart_ntd.dim_annual_service_mode_time_periods  [RUN]
23:24:29  2 of 2 OK created sql table model erika_mart_ntd.dim_annual_service_mode_time_periods  [CREATE TABLE (9.2k rows, 50.6 MiB processed) in 4.33s]
23:24:29
23:24:29  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 8.01 seconds (8.01s).
23:24:29
23:24:29  Completed successfully
23:24:29
23:24:29  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

❯ poetry run dbt test -s models/mart/ntd/dim_annual_service_mode_time_periods.sql
23:36:07  1 of 4 START test accepted_values_dim_annual_service_mode_time_periods_report_year__2022  [RUN]
23:36:07  2 of 4 START test accepted_values_dim_annual_service_mode_time_periods_time_period__Average_Weekday_AM_Peak__Average_Weekday_Midday__Average_Weekday_PM_Peak__Average_Weekday_Other__Average_Typical_Weekday__Average_Typical_Saturday__Average_Typical_Sunday  [RUN]
23:36:07  3 of 4 START test not_null_dim_annual_service_mode_time_periods_key ............ [RUN]
23:36:07  4 of 4 START test unique_dim_annual_service_mode_time_periods_key .............. [RUN]
23:36:08  3 of 4 PASS not_null_dim_annual_service_mode_time_periods_key .................. [PASS in 0.95s]
23:36:08  1 of 4 PASS accepted_values_dim_annual_service_mode_time_periods_report_year__2022  [PASS in 0.95s]
23:36:08  2 of 4 PASS accepted_values_dim_annual_service_mode_time_periods_time_period__Average_Weekday_AM_Peak__Average_Weekday_Midday__Average_Weekday_PM_Peak__Average_Weekday_Other__Average_Typical_Weekday__Average_Typical_Saturday__Average_Typical_Sunday  [PASS in 0.95s]
23:36:08  4 of 4 PASS unique_dim_annual_service_mode_time_periods_key .................... [PASS in 1.00s]
23:36:08
23:36:08  Finished running 4 tests in 0 hours 0 minutes and 2.12 seconds (2.12s).
23:36:08
23:36:08  Completed successfully
23:36:08
23:36:08  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4
❯ poetry run dbt run -s +models/mart/ntd/fct_annual_service_modes.sql
23:25:02  Running with dbt=1.5.1
23:25:04  Found 422 models, 967 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
23:25:04
23:25:07  Concurrency: 8 threads (target='dev')
23:25:07
23:25:07  1 of 2 START sql view model erika_staging.stg_ntd_annual_data__2022__service_by_mode  [RUN]
23:25:08  1 of 2 OK created sql view model erika_staging.stg_ntd_annual_data__2022__service_by_mode  [CREATE VIEW (0 processed) in 1.00s]
23:25:08  2 of 2 START sql table model erika_mart_ntd.fct_annual_service_modes ........... [RUN]
23:25:11  2 of 2 OK created sql table model erika_mart_ntd.fct_annual_service_modes ...... [CREATE TABLE (3.7k rows, 16.2 MiB processed) in 3.39s]
23:25:11
23:25:11  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 7.51 seconds (7.51s).
23:25:11
23:25:11  Completed successfully
23:25:11
23:25:11  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

❯ poetry run dbt test -s models/mart/ntd/fct_annual_service_modes.sql
23:36:56  1 of 4 START test accepted_values_fct_annual_service_modes_max_time_period__Annual_Total  [RUN]
23:36:56  2 of 4 START test accepted_values_fct_annual_service_modes_report_year__2022 ... [RUN]
23:36:56  3 of 4 START test not_null_fct_annual_service_modes_key ........................ [RUN]
23:36:56  4 of 4 START test unique_fct_annual_service_modes_key .......................... [RUN]
23:36:57  1 of 4 PASS accepted_values_fct_annual_service_modes_max_time_period__Annual_Total  [PASS in 0.72s]
23:36:57  3 of 4 PASS not_null_fct_annual_service_modes_key .............................. [PASS in 0.72s]
23:36:57  4 of 4 PASS unique_fct_annual_service_modes_key ................................ [PASS in 0.73s]
23:36:57  2 of 4 PASS accepted_values_fct_annual_service_modes_report_year__2022 ......... [PASS in 0.76s]
23:36:57
23:36:57  Finished running 4 tests in 0 hours 0 minutes and 1.92 seconds (1.92s).
23:36:57
23:36:57  Completed successfully
23:36:57
23:36:57  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

Post-merge follow-ups

Confirm the correct creation of the models on cal-itp-data-infra.

github-actions[bot] commented 4 weeks ago

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

New models 🌱

calitp_warehouse.mart.ntd.dim_annual_service_agencies

calitp_warehouse.mart.ntd.dim_annual_service_mode_time_periods

calitp_warehouse.mart.ntd.fct_annual_service_modes

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

tiffanychu90 commented 3 weeks ago
tiffanychu90 commented 3 weeks ago
jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (3396-ntd-agency) $ poetry run dbt run -s +"models/mart/ntd/dim_annual_service_agencies.sql"
22:03:36  Running with dbt=1.5.1
22:03:39  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
22:03:40  Found 422 models, 973 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
22:03:40  
22:03:44  Concurrency: 8 threads (target='dev')
22:03:44  
22:03:44  1 of 2 START sql view model tiffany_staging.stg_ntd_annual_data__2022__service_by_agency  [RUN]
22:03:45  1 of 2 OK created sql view model tiffany_staging.stg_ntd_annual_data__2022__service_by_agency  [CREATE VIEW (0 processed) in 1.07s]
22:03:45  2 of 2 START sql table model tiffany_mart_ntd.dim_annual_service_agencies ...... [RUN]
22:03:48  2 of 2 OK created sql table model tiffany_mart_ntd.dim_annual_service_agencies . [CREATE TABLE (2.2k rows, 7.9 MiB processed) in 3.61s]
22:03:48  
22:03:48  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 8.41 seconds (8.41s).
22:03:49  
22:03:49  Completed successfully
jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (3396-ntd-agency) $ poetry run dbt run -s +models/mart/ntd/fct_annual_service_modes.sql
22:05:34  Running with dbt=1.5.1
22:05:37  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
22:05:38  Found 422 models, 973 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
22:05:38  
22:05:42  Concurrency: 8 threads (target='dev')
22:05:42  
22:05:42  1 of 2 START sql view model tiffany_staging.stg_ntd_annual_data__2022__service_by_mode  [RUN]
22:05:43  1 of 2 OK created sql view model tiffany_staging.stg_ntd_annual_data__2022__service_by_mode  [CREATE VIEW (0 processed) in 1.01s]
22:05:43  2 of 2 START sql table model tiffany_mart_ntd.fct_annual_service_modes ......... [RUN]
22:05:46  2 of 2 OK created sql table model tiffany_mart_ntd.fct_annual_service_modes .... [CREATE TABLE (3.7k rows, 16.2 MiB processed) in 3.59s]
22:05:46  
22:05:46  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 8.36 seconds (8.36s).
22:05:47  
22:05:47  Completed successfully
22:05:47  
erikamov commented 3 weeks ago
  • Slack thread
  • I'll take a TODO to write up a GH issue to do some of the dbt docs + macros and start working on moving columns that are broadly present across many NTD tables into that

    • This PR should remain the mart NTD table creation, and a follow-up issue + PR can address docs
  • One remaining question: why are the columns read in from stg_ntd_annual_data__2022__service_by_agency have the max_ or sum_ prefix?

The column names with SUM and MAX actually came from the original NTD API data. I just kept them like it came to us, but I agree with you that is better without it. Thank you for all changes and information. 🤩