ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Accommodate IHS Housing Index continuing use of 2010 PUMAs #283

Closed wrridgeway closed 6 months ago

wrridgeway commented 6 months ago

The DePaul IHS releases quarterly housing index data at the census PUMA level; problematically for us this data continues to use 2010 PUMA geographies which we don't have for any parcels that didn't exist before 2022.

This PR creates a new table in the location Athena DB named pin10_2010_puma that joins 2010 PUMA geographies to all unique PIN10s. This table is then used as an intermediate step to join the IHS housing index to modeling views by PIN10 and year.

wrridgeway commented 6 months ago

I believe this is good to go. @dfsnow You'll probably want some naming changes.

WITH new AS (
    select
        meta_year as year,
        count(*) AS new_n
    from "z_ci_282-accommodate-ihs-housing-index-continuing-use-of-2010-pumas_model".vw_pin_shared_input
    group by meta_year
    ),
old AS (
    select
        meta_year as year,
        count(*) as old_n
    from model.vw_pin_shared_input
    group by meta_year
)

SELECT
    old.*,
    new.new_n
FROM old
LEFT JOIN new
    ON old.year = new.year
year old_n new_n
2023 1865414 1865414
2022 1866255 1866255
2021 1865833 1865833
2020 1865221 1865221
2019 1865288 1865288
2018 1865760 1865760
2017 1865766 1865766
2016 1865134 1865134
2015 1863932 1863932
2014 1863337 1863337
2013 1863099 1863099
2012 1863332 1863332
2011 1862169 1862169
2010 1860408 1860408
2009 1850798 1850798
2008 1833755 1833755
2007 1805334 1805334
2006 1772414 1772414
2005 1742100 1742100
2004 1716288 1716288
2003 1693209 1693209
2002 1673779 1673779
2001 1655424 1655424
2000 1637289 1637289
1999 1620990 1620990
WITH new AS (
    select
    meta_year,
    count(*) as new_n
from "z_ci_282-accommodate-ihs-housing-index-continuing-use-of-2010-pumas_model".vw_pin_shared_input
where other_ihs_avg_year_index IS NULL
    group by meta_year
    ),
old AS (
    select
    meta_year,
    count(*) as old_n
from "model".vw_pin_shared_input
where other_ihs_avg_year_index IS NULL
    group by meta_year
)

SELECT
    old.*,
    new.new_n
FROM old
LEFT JOIN new
    ON old.meta_year = new.meta_year
order by old.meta_year desc
meta_year old_n new_n
2023 1865414 109410
2022 1866255 110663
2021 109999 109971
2020 109980 109956
2019 111953 110060
2018 111141 110600
2017 110954 110401
2016 111339 110048
2015 109790 109686
2014 109718 109369
2013 109666 108982
2012 109268 108952
2011 1862169 108609
2010 1860408 109483
2009 1850798 101770
2008 1833755 93200
2007 1805334 86396
2006 1772414 79283
2005 1742100 71109
2004 1716288 65279
2003 1693209 59057
2002 1673779 53329
2001 1655424 49783
2000 1637289 49390
1999 1620990 51726
wrridgeway commented 6 months ago

Also, FYI, this new table is not unique by PIN10 after the spatial join due to stuff like this:

image

It isn't an issue for the modeling views since we average the housing index within pin10 and year. There are also a total of 42 parcels this is an issue for, so not a huge deal.

wrridgeway commented 6 months ago

Updated to use 2012. Athena queries still look good.