Use jinja templating to reduce duplicated operations in `reporting.res_report_summary`

wrridgeway commented 5 months ago

reporting.res_report_summary performs a lot of repetitive operations across a number of CTEs. Creating a jinja template for these operations will save us code and better ensure accuracy.

wrridgeway commented 5 months ago

@dfsnow or @jeancochrane is this the best way to implement this? As in: should I have these macros stored in one script, with a name that pertains to the only script they should be used in?

I'll do testing once I'm sure the implementation is correct.

wrridgeway commented 5 months ago

Testing

This query tests if there are any groupings that don't join between the old and new table, as well as whether there are any discrepancies in the number of pins per group or the sale price stats (the columns that are/were deterministic across new/old tables).

select
    *
from "reporting".res_report_summary as old
full outer join "z_ci_401-use-jinja-templating-to-reduce-duplicated-operations-in-reportingres-report-summary_reporting".res_report_summary as new
    on old.geography_id = new.geography_id
    and old.property_group = new.property_group
    and old.year = new.year
    and old.assessment_stage = new.assessment_stage
where old.pin_n != new.pin_n
    or old.pin_n is null
    or new.pin_n is null
    or old.sale_min != new.sale_min
    or old.sale_max != new.sale_max

Nothing is returned, which means we're good here.

This query is to compare median FMV between old and new tables. We don't expect them to be the same since approx_percentile is not deterministic, but we do expect them to be similar.

select
    old.geography_id,
    old.property_group,
    old.year,
    old.assessment_stage,
    old.fmv_median as old_fmv,
    new.fmv_median as new_fmv
from "reporting".res_report_summary as old
full outer join "z_ci_401-use-jinja-templating-to-reduce-duplicated-operations-in-reportingres-report-summary_reporting".res_report_summary as new
    on old.geography_id = new.geography_id
    and old.property_group = new.property_group
    and old.year = new.year
    and old.assessment_stage = new.assessment_stage
where old.fmv_median != new.fmv_median
limit 10

geography_id	property_group	year	assessment_stage	old_fmv	new_fmv
23170	ALL REGRESSION	2023	BOR CERTIFIED	1613995	1619807
30053	ALL REGRESSION	2022	BOR CERTIFIED	258613.9	258602.3
22080	ALL REGRESSION	2022	BOR CERTIFIED	319848.6	320045.6
22111	SF	2021	ASSESSOR CERTIFIED	291400.2	291360.6
72	MF	2024	model	229943.5	230475.8
17042	ALL REGRESSION	2022	ASSESSOR CERTIFIED	847894.4	847870.3
32310	SF	2022	model	229230.8	229411.8
22080	ALL REGRESSION	2023	ASSESSOR CERTIFIED	319966.4	320072.2
17	ALL REGRESSION	2024	model	286952.3	287376.3
37190	CONDO	2024	model	142525	142420

While we don't expect class modes to always match (since the old table wasn't deterministic) we do expect them to match the vast majority of the time since there won't usually be ties between the max number of classes for a grouping.

select
   old.class_mode = new.class_mode as "class modes match",
   count(*) as count
from "reporting".res_report_summary as old
full outer join "z_ci_401-use-jinja-templating-to-reduce-duplicated-operations-in-reportingres-report-summary_reporting".res_report_summary as new
    on old.geography_id = new.geography_id
    and old.property_group = new.property_group
    and old.year = new.year
    and old.assessment_stage = new.assessment_stage
group by old.class_mode = new.class_mode

class modes match	count
FALSE	86
TRUE	35875

Which is what we see.

ccao-data / data-architecture

Use jinja templating to reduce duplicated operations in `reporting.res_report_summary` #409

Testing