Closed wrridgeway closed 5 months ago
@dfsnow or @jeancochrane is this the best way to implement this? As in: should I have these macros stored in one script, with a name that pertains to the only script they should be used in?
I'll do testing once I'm sure the implementation is correct.
This query tests if there are any groupings that don't join between the old and new table, as well as whether there are any discrepancies in the number of pins per group or the sale price stats (the columns that are/were deterministic across new/old tables).
select
*
from "reporting".res_report_summary as old
full outer join "z_ci_401-use-jinja-templating-to-reduce-duplicated-operations-in-reportingres-report-summary_reporting".res_report_summary as new
on old.geography_id = new.geography_id
and old.property_group = new.property_group
and old.year = new.year
and old.assessment_stage = new.assessment_stage
where old.pin_n != new.pin_n
or old.pin_n is null
or new.pin_n is null
or old.sale_min != new.sale_min
or old.sale_max != new.sale_max
Nothing is returned, which means we're good here.
This query is to compare median FMV between old and new tables. We don't expect them to be the same since approx_percentile
is not deterministic, but we do expect them to be similar.
select
old.geography_id,
old.property_group,
old.year,
old.assessment_stage,
old.fmv_median as old_fmv,
new.fmv_median as new_fmv
from "reporting".res_report_summary as old
full outer join "z_ci_401-use-jinja-templating-to-reduce-duplicated-operations-in-reportingres-report-summary_reporting".res_report_summary as new
on old.geography_id = new.geography_id
and old.property_group = new.property_group
and old.year = new.year
and old.assessment_stage = new.assessment_stage
where old.fmv_median != new.fmv_median
limit 10
geography_id | property_group | year | assessment_stage | old_fmv | new_fmv |
---|---|---|---|---|---|
23170 | ALL REGRESSION | 2023 | BOR CERTIFIED | 1613995 | 1619807 |
30053 | ALL REGRESSION | 2022 | BOR CERTIFIED | 258613.9 | 258602.3 |
22080 | ALL REGRESSION | 2022 | BOR CERTIFIED | 319848.6 | 320045.6 |
22111 | SF | 2021 | ASSESSOR CERTIFIED | 291400.2 | 291360.6 |
72 | MF | 2024 | model | 229943.5 | 230475.8 |
17042 | ALL REGRESSION | 2022 | ASSESSOR CERTIFIED | 847894.4 | 847870.3 |
32310 | SF | 2022 | model | 229230.8 | 229411.8 |
22080 | ALL REGRESSION | 2023 | ASSESSOR CERTIFIED | 319966.4 | 320072.2 |
17 | ALL REGRESSION | 2024 | model | 286952.3 | 287376.3 |
37190 | CONDO | 2024 | model | 142525 | 142420 |
While we don't expect class modes to always match (since the old table wasn't deterministic) we do expect them to match the vast majority of the time since there won't usually be ties between the max number of classes for a grouping.
select
old.class_mode = new.class_mode as "class modes match",
count(*) as count
from "reporting".res_report_summary as old
full outer join "z_ci_401-use-jinja-templating-to-reduce-duplicated-operations-in-reportingres-report-summary_reporting".res_report_summary as new
on old.geography_id = new.geography_id
and old.property_group = new.property_group
and old.year = new.year
and old.assessment_stage = new.assessment_stage
group by old.class_mode = new.class_mode
class modes match | count |
---|---|
FALSE | 86 |
TRUE | 35875 |
Which is what we see.
reporting.res_report_summary
performs a lot of repetitive operations across a number of CTEs. Creating a jinja template for these operations will save us code and better ensure accuracy.