ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Fix multi-line detection in `default.vw_pin_condo_char` #318

Closed wrridgeway closed 5 months ago

wrridgeway commented 5 months ago

Multi line detection was previously defined pretty oddly, and did not seem like it was detecting what it was supposed to, namely multiple lines within a pin for a given year in the oby or comdat tables (where we draw residential and commercial condos from, respectively).

While the old method used pardat explicitly to avoid the problem that some condos seem only to exist in pardat and not in comdat or oby, pardat isn't actually a viable source for multi line detection.

This PR attempts to remedy that by counting rows in the WITH clauses that pull directly from oby and comdat. While it appears that this change is more in line with the behavior we expect, it does rather little ultimately since the number of multi line condos is apparently just very small.

wrridgeway commented 5 months ago
with old as (
    select
        year,
        class,
        sum(case when pin_is_multilline then 1 else 0 end) as pin_is_multilline_old,
        avg(case when cdu = 'GR' then 1 else 0 end) as cdu_old,
        count(*) as count_old
    from default.vw_pin_condo_char
    group by year, class
),

new as (
    select
        year,
        class,
        sum(case when pin_is_multilline then 1 else 0 end) as pin_is_multilline_new,
        avg(case when cdu = 'GR' then 1 else 0 end) as cdu_new,
        count(*) as count_new
    from "z_ci_317-fix-mult-land-line-detection-in-defaultvw-pin-condo-char_default".vw_pin_condo_char
    group by year, class
)

select
    old.year,
    old.class,
    count_old,
    count_new,
    cdu_old = cdu_new as cdu_parity,
    pin_is_multilline_old,
    pin_is_multilline_new
from old
left join new on old.year = new.year
    and old.class = new.class
year class count_parity cdu_parity pin_num_lline_old pin_num_lline_new pin_is_multilline_old pin_is_multilline_new
2024 399 TRUE TRUE 1 1 0 0
2024 299 TRUE TRUE 1.000002185 1.000004 1 2
2023 299 TRUE TRUE 1.000002185 1.000004 1 2
2023 399 TRUE TRUE 1 1 0 0
2022 399 TRUE TRUE 1 1 0 0
2022 299 TRUE TRUE 1.000002183 1.000004 1 2
2021 299 TRUE TRUE 1.000002188 1.000004 1 2
2021 399 TRUE TRUE 1 1 0 0
2020 399 TRUE TRUE 1 1 0 0
2020 299 TRUE TRUE 1.000002214 1 1 0
2019 399 TRUE TRUE 1 1 0 0
2019 299 TRUE TRUE 1.000002214 1.000007 1 3
2018 299 TRUE TRUE 1.000002214 1 1 0
2018 399 TRUE TRUE 1 1 0 0
2017 299 TRUE TRUE 1.000002206 1 1 0
2017 399 TRUE TRUE 1 1 0 0
2016 299 TRUE TRUE 1.000002208 1 1 0
2016 399 TRUE TRUE 1 1 0 0
2015 399 TRUE TRUE 1 1 0 0
2015 299 TRUE TRUE 1.000002209 1 1 0
2014 299 TRUE TRUE 1.000002209 1 1 0
2014 399 TRUE TRUE 1 1 0 0
2013 399 TRUE TRUE 1 1 0 0
2013 299 TRUE TRUE 1.000002208 1.000004 1 2
2012 399 TRUE TRUE 1 1 0 0
2012 299 TRUE TRUE 1.000002204 1 1 0
2011 299 TRUE TRUE 1.000002202 1 1 0
2011 399 TRUE TRUE 1 1 0 0
2010 299 TRUE TRUE 1.00000221 1 1 0
2010 399 TRUE TRUE 1 1 0 0
2009 399 TRUE TRUE 1 1 0 0
2009 299 TRUE TRUE 1.000002255 1.000025 1 1
2008 299 TRUE TRUE 1.000002341 1 1 0
2008 399 TRUE TRUE 1 1 0 0
2007 299 TRUE TRUE 1.000002496 1.000362 1 145
2007 399 TRUE TRUE 1 1 0 0
2006 299 TRUE TRUE 1.00000271 1 1 0
2006 399 TRUE TRUE 1 1 0 0
2005 299 TRUE TRUE 1.000002928 1 1 0
2005 399 TRUE TRUE 1 1 0 0
2004 299 TRUE TRUE 1.00000313 1 1 0
2004 399 TRUE TRUE 1 1 0 0
2003 399 TRUE TRUE 1 1 0 0
2003 299 TRUE TRUE 1.000003341 1 1 0
2002 399 TRUE TRUE 1 1 0 0
2002 299 TRUE TRUE 1.000003549 1 1 0
2001 399 TRUE TRUE 1 1 0 0
2001 299 TRUE TRUE 1.000003775 1 1 0
2000 299 TRUE TRUE 1.000004023 1.278704 1 69269
2000 399 TRUE TRUE 1 1.494325 0 871
1999 399 TRUE TRUE 1 1 0 0
1999 299 TRUE TRUE 1.000004278 1 1 0