Closed wrridgeway closed 2 months ago
WITH old AS (
SELECT
year,
stage,
class,
count(*) AS count,
avg(tot_sum) AS "avg"
FROM reporting.vw_assessment_roll
GROUP BY year, stage, class
),
new AS (
SELECT
year,
stage,
class,
count(*) AS count,
avg(tot_sum) AS "avg"
FROM "z_ci_366-decompose-reporting-viewsctas-and-implmenet-dbt-tests_reporting".vw_assessment_roll
GROUP BY year, stage, class
)
SELECT * FROM old
EXCEPT
SELECT * FROM new
Yields:
year | stage | class | count | avg |
---|---|---|---|---|
2023 | assessor certified | O | 1 | 0 |
2023 | mailed | O | 1 | 0 |
2023 | bor certified | O | 1 | 0 |
Looking at 2023 more closely, we see the differences are cause by one parcel switching from class 'O' to '2'. This is caused by a disagreement between pardat (old source for classes) and asmt_all (new source for classes).
SELECT
old.year,
old.stage,
old.class,
old.township_name,
old.n AS old_n,
new.n AS new_n
FROM reporting.vw_assessment_roll old
LEFT JOIN "z_ci_366-decompose-reporting-viewsctas-and-implmenet-dbt-tests_reporting".vw_assessment_roll new
ON old.year = new.year
AND old.stage = new.stage
AND old.class = new.class
AND old.township_name = new.township_name
WHERE old.n != new.n
OR new.n IS NULL
year | stage | class | township_name | old_n | new_n |
---|---|---|---|---|---|
2023 | bor certified | O | Riverside | 1 | |
2023 | mailed | 2 | Riverside | 5908 | 5909 |
2023 | assessor certified | 2 | Riverside | 5908 | 5909 |
2023 | bor certified | 2 | Riverside | 5908 | 5909 |
2023 | mailed | O | Riverside | 1 | |
2023 | assessor certified | O | Riverside | 1 |
All AVs across the two views match, except for 2023 class 2 values because of this one parcel. EX AVs match because adding another exempt parcel to the group won't change sums, medians, etc.
Return any rows that don't join by pin and year across the old and new views.
SELECT * FROM reporting.vw_top_5 old
FULL OUTER JOIN "z_ci_366-decompose-reporting-viewsctas-and-implmenet-dbt-tests_reporting".vw_top_5 new
ON old.parid = new.pin
AND old.year = new.year
WHERE old.parid IS NULL
OR new.pin IS NULL
Doesn't return any rows that don't join. Are there any differences between the views even if they have the same pins by year?
WITH differences AS (
SELECT * FROM reporting.vw_top_5 old
EXCEPT
SELECT * FROM "z_ci_366-decompose-reporting-viewsctas-and-implmenet-dbt-tests_reporting".vw_top_5 new
)
SELECT
old.total_av AS old_av,
new.total_av AS new_av,
old.address AS old_address,
new.address AS new_address
FROM reporting.vw_top_5 old
FULL OUTER JOIN "z_ci_366-decompose-reporting-viewsctas-and-implmenet-dbt-tests_reporting".vw_top_5 new
ON old.parid = new.pin
AND old.year = new.year
WHERE old.parid IN (SELECT parid FROM differences)
Yes. All of the values are the same, but addresses can vary. New addresses come from vw_pin_address
rather than being constructed within vw_top_5
itself:
old_av | new_av | old_address | new_address |
---|---|---|---|
7685953 | 7685953 | 0 UNKNOWN | UNKNOWN |
2725266 | 2725266 | 0 183RD ST | 183RD ST |
1640318 | 1640318 | 0 UNKNOWN | UNKNOWN |
5419988 | 5419988 | 0 UNKNOWN | UNKNOWN |
3134002 | 3134002 | 0 UNKNOWN | UNKNOWN |
7855492 | 7855492 | 0 W DUNDEE RD | W DUNDEE RD |
5735807 | 5735807 | 0 UNKNOWN | UNKNOWN |
5735807 | 5735807 | 0 UNKNOWN | UNKNOWN |
8744961 | 8744961 | 0 UNKNOWN | UNKNOWN |
7080684 | 7080684 | 0 UNKNOWN | UNKNOWN |
2999809 | 2999809 | 0 183RD ST | 183RD ST |
2725266 | 2725266 | 0 183RD ST | 183RD ST |
2355590 | 2355590 | 0 N OAK PARK AVE | N OAK PARK AVE |
4821389 | 4821389 | 0 RIDGE RD | RIDGE RD |
4005278 | 4005278 | 0 UNKNOWN | UNKNOWN |
66090511 | 66090511 | 0 UNKNOWN | UNKNOWN |
5165229 | 5165229 | 0 UNKNOWN | UNKNOWN |
1966845 | 1966845 | 0 UNKNOWN | UNKNOWN |
8828665 | 8828665 | 0 P O BOX A-3879 | P O BOX A-3879 |
1966845 | 1966845 | 0 UNKNOWN | UNKNOWN |
8828665 | 8828665 | 0 P O BOX A-3879 | P O BOX A-3879 |
7714497 | 7714497 | 0 P O BOX A-3879 | P O BOX A-3879 |
7714497 | 7714497 | 0 P O BOX A-3879 | P O BOX A-3879 |
2355590 | 2355590 | 0 N OAK PARK AVE | N OAK PARK AVE |
2626619 | 2626619 | 0 N OAK PARK AVE | N OAK PARK AVE |
2355590 | 2355590 | 0 N OAK PARK AVE | N OAK PARK AVE |
1894606 | 1894606 | 0 N OAK PARK AVE | N OAK PARK AVE |
26834144 | 26834144 | 0 LAKE COOK RD | LAKE COOK RD |
36691499 | 36691499 | 0 LAKE COOK RD | LAKE COOK RD |
35896968 | 35896968 | 0 LAKE COOK RD | LAKE COOK RD |
24035764 | 24035764 | 0 LAKE COOK RD | LAKE COOK RD |
36691499 | 36691499 | 0 LAKE COOK RD | LAKE COOK RD |
33978930 | 33978930 | 0 LAKE COOK RD | LAKE COOK RD |
42231002 | 42231002 | 0 LAKE COOK RD | LAKE COOK RD |
33978930 | 33978930 | 0 LAKE COOK RD | LAKE COOK RD |
26651297 | 26651297 | 0 LAKE COOK RD | LAKE COOK RD |
35896971 | 35896971 | 0 LAKE COOK RD | LAKE COOK RD |
38970697 | 38970697 | 0 LAKE COOK RD | LAKE COOK RD |
26210094 | 26210094 | 0 LAKE COOK RD | LAKE COOK RD |
24035765 | 24035765 | 0 LAKE COOK RD | LAKE COOK RD |
14401790 | 14401790 | 1601 S CICERO AVE | 1601 S CICERO AVE |
14932324 | 14932324 | 1601 S CICERO AVE | 1601 S CICERO AVE |
4343773 | 4343773 | 0 UNKNOWN | UNKNOWN |
5401103 | 5401103 | 0 CICERO | CICERO |
4343773 | 4343773 | 0 UNKNOWN | UNKNOWN |
Aggregate old and new views to count number of rows and average sale_n column by year. Compare aggregations and return and rows that disagree or don't join.
WITH old AS (
SELECT
sale_year,
count(*) AS old_count,
avg(sale_n) AS old_sales
FROM reporting.vw_town_sale_history
GROUP BY sale_year
),
new AS (
SELECT
sale_year,
count(*) AS new_count,
avg(sale_n) AS new_sales
FROM "z_ci_366-decompose-reporting-viewsctas-and-implmenet-dbt-tests_reporting".vw_town_sale_history
GROUP BY sale_year
)
SELECT
old.sale_year,
old_count,
new_count,
old_sales,
new_sales
FROM old
FULL OUTER JOIN new
ON old.sale_year = new.sale_year
WHERE old_count != new_count
OR old_sales != new_sales
OR old_count IS NULL
OR new_count IS NULL
Yields one row for 1973:
sale_year | old_count | new_count | old_sales | new_sales |
---|---|---|---|---|
1973 | 2 | 1 |
This difference is cause by using vw_pin_township_class
as the universe of pins by year for the new view rather than vw_pin_sale
.
Compare counts and average ratios across years
WITH new AS (
SELECT
year,
count(*) AS new_n,
avg(ratio) AS new_ratio
FROM "z_ci_366-decompose-reporting-viewsctas-and-implmenet-dbt-tests_reporting".vw_ratio_stats
GROUP BY year
),
OLD AS (
SELECT
year,
count(*) AS old_n,
avg(ratio) AS old_ratio
FROM reporting.vw_ratio_stats
GROUP BY year
)
SELECT
new.year,
new_n,
old_n,
new_ratio,
old_ratio
FROM new
FULL OUTER JOIN old
ON new.year = old.year
year | new_n | old_n | new_ratio | old_ratio |
---|---|---|---|---|
2024 | 57008 | 57008 | 1.087891 | 1.087891 |
2023 | 231949 | 231949 | 0.949966 | 0.949966 |
2022 | 300219 | 300219 | 0.940188 | 0.940188 |
2021 | 205368 | 205368 | 0.949351 | 0.949351 |
We can't compare medians very well since they aren't deterministic, but we can compare other non-median columsn across the new and old CTAS (yrblt is a median column which explains the very small diffs).
WITH new AS (
SELECT
year,
LOWER(assessment_stage) AS assessment_stage,
count(*) AS new_n,
avg(pin_n) AS new_group_count,
avg(sale_n) AS new_sale_count,
avg(sale_max) AS new_sale_max,
avg(CAST(sale_year AS decimal)) AS new_sale_year,
avg(yrblt_median) as new_yrblt
FROM "z_ci_366-decompose-reporting-viewsctas-and-implmenet-dbt-tests_reporting".res_report_summary
GROUP BY year, assessment_stage
),
OLD AS (
SELECT
year,
LOWER(assessment_stage) AS assessment_stage,
count(*) AS old_n,
avg(pin_n) AS old_group_count,
avg(sale_n) AS old_sale_count,
avg(sale_max) AS old_sale_max,
avg(CAST(sale_year AS decimal)) AS old_sale_year,
avg(yrblt_median) as old_yrblt
FROM reporting.res_report_summary
GROUP BY year, assessment_stage
)
SELECT
new.year,
new.assessment_stage,
new_n,
old_n,
new_group_count,
old_group_count,
new_sale_count,
old_sale_count,
new_sale_max,
old_sale_max,
new_sale_year,
old_sale_year,
old_yrblt,
new_yrblt
FROM new
FULL OUTER JOIN old
ON new.year = old.year
AND new.assessment_stage = old.assessment_stage
year | assessment_stage | new_n | old_n | new_group_count | old_group_count | new_sale_count | old_sale_count | new_sale_max | old_sale_max | new_sale_year | old_sale_year | old_yrblt | new_yrblt |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021 | bor certified | 2950 | 2950 | 1987.445 | 1987.445 | 96.85544 | 96.85544 | 646079.2 | 646079.2 | 2020 | 2020 | 1959.09 | 1959.09 |
2024 | model | 2952 | 2952 | 1983.967 | 1983.966 | 75.84529 | 75.84529 | 791360.9 | 791360.9 | 2023 | 2023 | 1959.138 | 1959.127 |
2023 | assessor certified | 2952 | 2952 | 1984.211 | 1984.211 | 98.52804 | 98.52804 | 777477.1 | 777477.1 | 2022 | 2022 | 1959.097 | 1959.1 |
2021 | mailed | 2950 | 2950 | 1987.364 | 1987.364 | 96.85544 | 96.85544 | 646079.2 | 646079.2 | 2020 | 2020 | 1959.085 | 1959.092 |
2023 | mailed | 2952 | 2952 | 1984.211 | 1984.211 | 98.52804 | 98.52804 | 777477.1 | 777477.1 | 2022 | 2022 | 1959.1 | 1959.099 |
2024 | mailed | 268 | 268 | 1301.164 | 1301.164 | 52.29268 | 52.29268 | 920229.9 | 920229.9 | 2023 | 2023 | 1938.511 | 1938.501 |
2023 | bor certified | 1277 | 1277 | 1467.787 | 1467.787 | 72.10644 | 72.10644 | 777222.7 | 777222.7 | 2022 | 2022 | 1962.209 | 1962.205 |
2022 | assessor certified | 3352 | 3352 | 1748.9 | 1748.9 | 106.5958 | 106.5958 | 724392.8 | 724392.8 | 2021 | 2021 | 1957.638 | 1957.637 |
2023 | model | 2947 | 2947 | 1982.579 | 1982.579 | 98.52804 | 98.52804 | 777477.1 | 777477.1 | 2022 | 2022 | 1959.126 | 1959.116 |
2022 | model | 2947 | 2947 | 1984.309 | 1984.309 | 106.5958 | 106.5958 | 724392.8 | 724392.8 | 2021 | 2021 | 1959.097 | 1959.103 |
2021 | assessor certified | 2950 | 2950 | 1987.445 | 1987.445 | 96.85544 | 96.85544 | 646079.2 | 646079.2 | 2020 | 2020 | 1959.09 | 1959.091 |
2022 | mailed | 3352 | 3352 | 1748.9 | 1748.9 | 106.5958 | 106.5958 | 724392.8 | 724392.8 | 2021 | 2021 | 1957.636 | 1957.638 |
2022 | bor certified | 3352 | 3352 | 1747.823 | 1747.823 | 106.5958 | 106.5958 | 724392.8 | 724392.8 | 2021 | 2021 | 1957.632 | 1957.628 |
@dfsnow As you mentioned, it would be a good idea to use ccao.class_dict
in lieu of the manual coding in vw_pin_township_class
. But ccao.class_dict
needs some updating before that can happen - we need to make sure it incorporates all possible classes from both pardat
and asmt_all
(they don't overlap completely) and we need to update it to accommodate class codes having A and B endings now (should have anticipated this when adding OA classes).
I think updating ccao.class_dict
and refactoring reporting.vw_pin_township_class
to use it would be a good PR on its own (it'd have to be pretty fast on the heels of https://github.com/ccao-data/data-architecture/pull/353, or precede it).
@dfsnow As you mentioned, it would be a good idea to use
ccao.class_dict
in lieu of the manual coding invw_pin_township_class
. Butccao.class_dict
needs some updating before that can happen - we need to make sure it incorporates all possible classes from bothpardat
andasmt_all
(they don't overlap completely) and we need to update it to accommodate class codes having A and B endings now (should have anticipated this when adding OA classes).I think updating
ccao.class_dict
and refactoringreporting.vw_pin_township_class
to use it would be a good PR on its own (it'd have to be pretty fast on the heels of #353, or precede it).
Excellent! @wrridgeway why don't you make a separate fast follow issue for this once #353 and this PR are merged.
Some of the reporting views and CTAS are rather unwieldy - let's decompose their CTEs into views that can be tested using dbt. Other views can be refactored using views from our default athena database to decrease the amount of code used.
CTAS to refactor/decompose: reporting.res_report_summary
Views to refactor/decompose: reporting.vw_ratio_stats reporting.vw_assessment_roll reporting.vw_top_5 reporting.vw_town_sale_history
Views to add: reporting.vw_pin_value_long reporting.vw_pin_township_class
This PR primarily shifts construction of common reporting groups out of each reporting view/CTAS and into
vw_pin_township_class
, which can be used to attach all common reporting groups to any pin/year combo. It also constructs a long version ofvw_pin_value
since multiple reporting views construct output at a stage-level.