ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Remove dummy PINs from `vw_pin_universe` #342

Closed wrridgeway closed 3 months ago

wrridgeway commented 3 months ago

See https://github.com/ccao-data/public/issues/25.

There are a few PINs in the iasWorld backend used for testing new features, calculations, etc. We want these in the raw ias data but probably don't need to expose them via our primary views. Write a quick regular expression/filter to catch and exclude the mentioned test PIN, then add a dbt test/regex to test any further malformed PINs and add it to the QC workbooks.

wrridgeway commented 3 months ago

@dfsnow

with old as (
select year, count(*) as old
from default.vw_pin_universe
group by year
),
new as (
select year, count(*) as new
from "z_ci_340-remove-dummy-pins-from-vw-pin-universe_default".vw_pin_universe
group by year
)

select old.year, old.old - new.new as diff
from old
left join new on old.year = new.year
order by old.year desc
year diff
2024 1
2023 1
2022 0
2021 0
2020 0
2019 0
2018 0
2017 0
2016 0
2015 0
2014 0
2013 0
2012 0
2011 0
2010 0
2009 0
2008 0
2007 0
2006 0
2005 0
2004 0
2003 0
2002 0
2001 0
2000 0
1999 0
wrridgeway commented 3 months ago

vw_pin_address

with old as (
select year, count(*) as old
from default.vw_pin_address
group by year
),
new as (
select year, count(*) as new
from "z_ci_340-remove-dummy-pins-from-vw-pin-universe_default".vw_pin_address
group by year
)

select old.year, old.old - new.new as diff
from old
left join new on old.year = new.year
order by old.year desc
year diff
2024 1
2023 1
2022 0
2021 0
2020 0
2019 0
2018 0
2017 0
2016 0
2015 0
2014 0
2013 0
2012 0
2011 0
2010 0
2009 0
2008 0
2007 0
2006 0
2005 0
2004 0
2003 0
2002 0
2001 0
2000 0
1999 0

vw_pin_exempt

with old as (
select year, count(*) as old
from default.vw_pin_appeal
group by year
),
new as (
select year, count(*) as new
from "z_ci_340-remove-dummy-pins-from-vw-pin-universe_default".vw_pin_appeal
group by year
)

select old.year, old.old - new.new as diff
from old
left join new on old.year = new.year
order by old.year desc
year diff
2024 0
2023 0
2022 0
wrridgeway commented 3 months ago

vw_pin_appeal was experiencing larger swings in the difference between yearly row counts when filtering out pardat test PINs because htpar can include rows for pins for years outside of where they exist in pardat (think COE for a deactivated PIN). putting a condition on pardat pins after they're left joined onto htpar removes htpar rows that don't have a matched row from pardat. since pardat isn't actually the source of truth for appeals i've removed the appeal view from this PR.