cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
48 stars 13 forks source link

Fix: reorder Benefits mart fields #3518

Closed thekaveman closed 2 weeks ago

thekaveman commented 3 weeks ago

~This PR is currently on top of #3515, so I could test this locally more easily. Once that PR is merged, this can be rebased on main.~

Description

Follow up to #3468.

The UNION DISTINCT was producing strange results, e.g. combining results from different columns into the same output column.

This was due to one or both of:

  1. The order of the columns not being the same between the baseline and transform CTE
  2. Some columns missing from the transform CTE

This PR alphabetizes the fields in both SELECT statements, and ensures that the transform SELECTS all fields from the baseline CTE.

A minor side effect of this cleanup was to remove the transform CTE in favor of a direct SELECT statement. This simplifies the back-and-forth to keep the field order straight.

This is a full list of the fields in alphabetical order (from BigQuery)

amplitude_id
app
city
client_event_time
client_upload_time
country
device_family
device_id
device_type
event_id
event_properties_card_tokenize_func
event_properties_card_tokenize_url
event_properties_claims_provider
event_properties_eligibility_verifier
event_properties_enrollment_flows
event_properties_enrollment_group
event_properties_enrollment_method
event_properties_error_name
event_properties_error_status
event_properties_error_sub
event_properties_href
event_properties_language
event_properties_origin
event_properties_path
event_properties_status
event_properties_transit_agency
event_time
event_type
language
library
os_name
os_version
processed_time
region
server_received_time
server_upload_time
session_id
start_version
user_id
user_properties_eligibility_verifier
user_properties_enrollment_flows
user_properties_enrollment_method
user_properties_initial_referrer
user_properties_initial_referring_domain
user_properties_referrer
user_properties_referring_domain
user_properties_transit_agency
user_properties_user_agent
uuid
version_name

Type of change

How has this been tested?

$ poetry run dbt run -s +fct_benefits_events
03:26:13  Running with dbt=1.5.1
03:26:15  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
03:26:15  Found 420 models, 950 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
03:26:15  
03:26:18  Concurrency: 8 threads (target='dev')
03:26:18  
03:26:18  1 of 2 START sql view model kegan_staging.stg_amplitude__benefits_events ....... [RUN]
03:26:19  1 of 2 OK created sql view model kegan_staging.stg_amplitude__benefits_events .. [CREATE VIEW (0 processed) in 1.06s]
03:26:19  2 of 2 START sql table model kegan_mart_benefits.fct_benefits_events ........... [RUN]
03:26:34  2 of 2 OK created sql table model kegan_mart_benefits.fct_benefits_events ...... [CREATE TABLE (26.9m rows, 73.1 GiB processed) in 14.77s]
03:26:34  
03:26:34  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 19.02 seconds (19.02s).
03:26:34  
03:26:34  Completed successfully
03:26:34  
03:26:34  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Post-merge follow-ups

github-actions[bot] commented 3 weeks ago

Warehouse report 📦

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White