ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Consider using `iasworld.legdat.xcoord`/`ycoord` for parcel centroid location #352

Closed wrridgeway closed 2 months ago

wrridgeway commented 3 months ago

Unbeknownst to us, parcel centroids are in legdat, in EPSG:3435.

wrridgeway commented 3 months ago

Number of null/non-null coords by year

select
    taxyr, xcoord is null as xcoord_null, count(*) AS count
from iasworld.legdat
group by taxyr, xcoord is null
order by taxyr desc
taxyr xcoord_null count
2024 FALSE 1,859,107
2024 TRUE 5,200
2023 FALSE 1,864,915
2023 TRUE 5,143
2022 FALSE 1,864,588
2022 TRUE 5,211
2021 TRUE 3,231
2021 FALSE 1,864,588
2020 TRUE 1,864,035
2019 TRUE 1,864,102
2018 TRUE 1,864,590
2017 TRUE 1,864,621
2016 TRUE 1,863,996
2015 TRUE 1,862,756
2014 TRUE 1,862,170
2013 TRUE 1,861,935
2012 TRUE 1,862,181
2011 TRUE 1,861,008
2010 TRUE 1,859,244
2009 TRUE 1,849,650
2008 TRUE 1,832,618
2007 TRUE 1,804,219
2006 TRUE 1,771,314
2005 TRUE 1,740,965
2004 TRUE 1,716,218
2003 TRUE 1,693,153
2002 TRUE 1,673,735
2001 TRUE 1,655,398
2000 TRUE 1,637,269
1999 TRUE 1,620,990
wrridgeway commented 3 months ago

There are some differences:

select
    vpu.pin, vpu.year,
    round(vpu.x_3435, 0) as x_3435, leg.xcoord,
    abs(round(vpu.x_3435, 0) - leg.xcoord) as x_diff,
    round(vpu.y_3435, 0) as y_3435, leg.ycoord,
    abs(round(vpu.y_3435, 0) - leg.ycoord) as y_diff
from default.vw_pin_universe vpu
left join iasworld.legdat leg
    on vpu.pin = leg.parid
    and vpu.year = leg.taxyr
where vpu.pin = '19161000028059'
    and vpu.year = '2023'
pin year x_3435 xcoord x_diff y_3435 ycoord y_diff
19161000028059 2023 1146184 1143043 3141 1864535 1865095 560

Number of matching/non-matching coords by year

with match as (
    select
        vpu.pin, vpu.year,
        round(vpu.x_3435, 0) = leg.xcoord and round(vpu.y_3435, 0) = leg.ycoord as match
    from default.vw_pin_universe vpu
    left join iasworld.legdat leg
        on vpu.pin = leg.parid
        and vpu.year = leg.taxyr
    where leg.xcoord is not null
        and vpu.x_3435 is not null
)

select
    match, year, count(*) as count
from match
group by match, year
order by count(*) desc
match year count
TRUE 2021 1,800,172
TRUE 2022 1,796,632
TRUE 2023 1,793,904
FALSE 2023 64,902
FALSE 2021 63,663
FALSE 2022 63,355
dfsnow commented 2 months ago

Considering that the parcel-file-based coordinates imputed by us go back farther than 2021 AND most of the differences between files are less than a couple hundred feet, I'm inclined to leave things as they are. @wrridgeway what do you think? If we're agreed then we can close this IMO.

wrridgeway commented 2 months ago

I'm in agreement. This will be really helpful for one-offs, but the gain from switching over completely seems minimal.