ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Update `vw_pin_appeal` with new post-2020 logic cutoff #443

Closed dfsnow closed 1 month ago

dfsnow commented 1 month ago

It looks like our logic in vw_pin_appeal for including ~2020 appeals is potentially wrong (or the underlying data has changed). This PR updates the logic to take the correct set of columns for 2020.

Current output

You can see from the table below that the change and reason code columns are entirely null for 2020. Each cell is the null proportion for that year and column.

See query ```sql SELECT year, ROUND(SUM(CAST(change IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS change_np, ROUND(SUM(CAST(reason_code1 IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS reason1_np, ROUND(SUM(CAST(reason_code2 IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS reason2_np, ROUND(SUM(CAST(reason_code3 IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS reason3_np, ROUND(SUM(CAST(agent_code IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS agent_code_np, ROUND(SUM(CAST(agent_name IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS agent_name_np FROM default.vw_pin_appeal WHERE year >= '2017' GROUP BY year ORDER BY year DESC ```
year change_np reason1_np reason2_np reason3_np agent_code_np agent_name_np
2024 0.093 0.093 0.962 0.998 0.22 0.22
2023 0.071 0.071 0.948 0.991 0.299 0.299
2022 0.001 0.001 0.963 0.994 0.167 0.167
2021 0.0 0.0 0.987 1.0 0.131 0.131
2020 1.0 1.0 1.0 1.0 1.0 1.0
2019 0.1 0.1 0.929 1.0 1.0 1.0
2018 0.158 0.158 0.846 1.0 1.0 1.0
2017 0.282 0.281 0.901 1.0 1.0 1.0

After PR

The updated view populates the change column for 2020 and thus reduces the null proportion. However, the other columns (reason codes, agent names) don't seem to be populated in any of the potential columns, so the null proportion is still 1.

See query ```sql SELECT year, ROUND(SUM(CAST(change IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS change_np, ROUND(SUM(CAST(reason_code1 IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS reason1_np, ROUND(SUM(CAST(reason_code2 IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS reason2_np, ROUND(SUM(CAST(reason_code3 IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS reason3_np, ROUND(SUM(CAST(agent_code IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS agent_code_np, ROUND(SUM(CAST(agent_name IS NULL AS int)) / CAST(COUNT(*) AS double), 3) AS agent_name_np FROM z_ci_dfsnow_fix_appeal_view_default.vw_pin_appeal WHERE year >= '2017' GROUP BY year ORDER BY year DESC ```
year change_np reason1_np reason2_np reason3_np agent_code_np agent_name_np
2024 0.093 0.093 0.962 0.998 0.22 0.22
2023 0.071 0.071 0.948 0.991 0.299 0.299
2022 0.001 0.001 0.963 0.994 0.167 0.167
2021 0.0 0.0 0.987 1.0 0.131 0.131
2020 0.007 1.0 1.0 1.0 1.0 1.0
2019 0.1 0.1 0.929 1.0 1.0 1.0
2018 0.158 0.158 0.846 1.0 1.0 1.0
2017 0.282 0.281 0.901 1.0 1.0 1.0

@ccao-jardine FYI