ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
6 stars 4 forks source link

Use jinja templating to reduce duplicated operations in `reporting.res_report_summary` #409

Closed wrridgeway closed 5 months ago

wrridgeway commented 5 months ago

reporting.res_report_summary performs a lot of repetitive operations across a number of CTEs. Creating a jinja template for these operations will save us code and better ensure accuracy.

wrridgeway commented 5 months ago

@dfsnow or @jeancochrane is this the best way to implement this? As in: should I have these macros stored in one script, with a name that pertains to the only script they should be used in?

I'll do testing once I'm sure the implementation is correct.

wrridgeway commented 5 months ago

Testing

This query tests if there are any groupings that don't join between the old and new table, as well as whether there are any discrepancies in the number of pins per group or the sale price stats (the columns that are/were deterministic across new/old tables).

select
    *
from "reporting".res_report_summary as old
full outer join "z_ci_401-use-jinja-templating-to-reduce-duplicated-operations-in-reportingres-report-summary_reporting".res_report_summary as new
    on old.geography_id = new.geography_id
    and old.property_group = new.property_group
    and old.year = new.year
    and old.assessment_stage = new.assessment_stage
where old.pin_n != new.pin_n
    or old.pin_n is null
    or new.pin_n is null
    or old.sale_min != new.sale_min
    or old.sale_max != new.sale_max

Nothing is returned, which means we're good here.

This query is to compare median FMV between old and new tables. We don't expect them to be the same since approx_percentile is not deterministic, but we do expect them to be similar.

select
    old.geography_id,
    old.property_group,
    old.year,
    old.assessment_stage,
    old.fmv_median as old_fmv,
    new.fmv_median as new_fmv
from "reporting".res_report_summary as old
full outer join "z_ci_401-use-jinja-templating-to-reduce-duplicated-operations-in-reportingres-report-summary_reporting".res_report_summary as new
    on old.geography_id = new.geography_id
    and old.property_group = new.property_group
    and old.year = new.year
    and old.assessment_stage = new.assessment_stage
where old.fmv_median != new.fmv_median
limit 10
geography_id property_group year assessment_stage old_fmv new_fmv
23170 ALL REGRESSION 2023 BOR CERTIFIED 1613995 1619807
30053 ALL REGRESSION 2022 BOR CERTIFIED 258613.9 258602.3
22080 ALL REGRESSION 2022 BOR CERTIFIED 319848.6 320045.6
22111 SF 2021 ASSESSOR CERTIFIED 291400.2 291360.6
72 MF 2024 model 229943.5 230475.8
17042 ALL REGRESSION 2022 ASSESSOR CERTIFIED 847894.4 847870.3
32310 SF 2022 model 229230.8 229411.8
22080 ALL REGRESSION 2023 ASSESSOR CERTIFIED 319966.4 320072.2
17 ALL REGRESSION 2024 model 286952.3 287376.3
37190 CONDO 2024 model 142525 142420

While we don't expect class modes to always match (since the old table wasn't deterministic) we do expect them to match the vast majority of the time since there won't usually be ties between the max number of classes for a grouping.

select
   old.class_mode = new.class_mode as "class modes match",
   count(*) as count
from "reporting".res_report_summary as old
full outer join "z_ci_401-use-jinja-templating-to-reduce-duplicated-operations-in-reportingres-report-summary_reporting".res_report_summary as new
    on old.geography_id = new.geography_id
    and old.property_group = new.property_group
    and old.year = new.year
    and old.assessment_stage = new.assessment_stage
group by old.class_mode = new.class_mode
class modes match count
FALSE 86
TRUE 35875

Which is what we see.