NYCPlanning / db-acs

American Community Survey data processing for Population Fact Finder
4 stars 1 forks source link

2014-2018 new issues #28

Open SPTKL opened 4 years ago

SPTKL commented 4 years ago

so to have it all in the same place of the errors i've found so far in my checks:

select * from (
select * from (
select *,
(case when ac is null then 1 else 0 end)+(case when bc is null then 1 else 0 end) as c_,  
(case when ae is null then 1 else 0 end)+(case when be is null then 1 else 0 end) as e_, 
(case when am is null then 1 else 0 end)+(case when bm is null then 1 else 0 end) as m_,
(case when ap is null then 1 else 0 end)+(case when bp is null then 1 else 0 end) as p_,
(case when az is null then 1 else 0 end)+(case when bz is null then 1 else 0 end) as z_
from(
select a.geoid, a.variable, 
a.c as ac, 
a.e as ae,
a.m as am,
a.p as ap,
a.z as az,
b.c as bc, 
b.e as be,
b.m as bm,
b.p as bp,
b.z as bz from 
pff_housing."Y2014-2018" a
JOIN pff_housing."Y2014-2018-erica" b
ON a.variable = b.variable AND a.geoid = b.geoid) c) d
where c_=1 or e_ = 1 or m_=1 or p_=1 or z_=1) e
WHERE bm is not null and am is null;

SELECT * FROM (
SELECT a.geoid,a.variable, 
round(a.c-b.c) as c_diff, 
round(a.e-b.e) as e_diff, 
round(a.m-b.m) as m_diff, 
round(a.p-b.p) as p_diff, 
round(a.z-b.z) as z_diff, 
'' as gap0,
a.c, b.c AS c_,
'' as gap1,
a.e, b.e AS e_,
'' as gap2,
a.m, b.m AS m_,
'' as gap3,
a.p, b.p AS p_,
'' as gap4,
a.z, b.z AS z_
FROM pf."Y2014-2018-erica" a
RIGHT JOIN pff_economic."Y2014-2018" b 
ON a.geoid = b.geoid AND a.variable = b.variable
WHERE a.c != b.c
OR a.c != b.c
OR a.e != b.e
OR a.m != b.m
OR a.p != b.p
OR a.z != b.z) dif
WHERE c_diff != 0
    or e_diff != 0
    or m_diff != 0
    or p_diff != 0
    or z_diff != 0;
SPTKL commented 4 years ago

ECON:

update: CvLFUEm2: if we recalculated p and z, mismatch resolves

status:

Good morning, I looked into the Median Moes, they all seem to have the same issue. Following the instruction, sometimes P_upper falls in the last bin interval, and we can't find A2 because there's no next higher category, that's why the MOE is left as null because the calculation cannot proceed image

e.g. for MN21, mdefftwrk

here's the cumulative distribution

[0.20990764063811923, 0.20990764063811923, 0.20990764063811923, 0.20990764063811923, 0.6856982927511894, 0.8396305625524769, 1.035544360481388, 1.56731038343129, 1.9031626084522808, 2.127064091799608, 2.65883011474951, 3.456479149174363, 3.8902882731598094, 7.514693534844668, 9.599776098516653, 15.938986845787854, 25.98656591099916, 34.1589700531766, 52.54687937307585, 100.0]

p_lower = 46.55054141297496
p_upper = 53.44945858702504

p_upper falls in the last bin available, therefore cannot define A2 in this case

SPTKL commented 4 years ago

SOCIAL:

reason: for these country variables, base variable Pop_5 has -555555555 as m value from census API, which gets changed to null

status:

SPTKL commented 4 years ago

HOUSING:

same reason see below

status:

SPTKL commented 4 years ago

DEMO:

base variables have null MOE pulled from census e.g. for

"Pop": [
        "B01001_001"
    ]

image

status:

image

EricaMaurer commented 4 years ago

Also seeing top coding + 1 issue on economic profile and medians missing m & c data

RE:

data pulled directly from census:

('DP03_0093E', '250001')
('DP03_0093M', '-333333333')
('DP03_0093PE', '-888888888')
('DP03_0093PM', '-888888888')
('DP03_0093EA', '250,000+')
('DP03_0093MA', '***')
('DP03_0093PEA', '(X)')
('DP03_0093PMA', '(X)')
EricaMaurer commented 4 years ago

RE: resolved

EricaMaurer commented 4 years ago
SPTKL commented 4 years ago

everything seems to be resolved, only remaining issue is and QN99 missing c and m for mdrms still need some digging to resolve that one record, otherwise everything else is ready to go row counts are consistent across all 3 years for each table

SPTKL commented 4 years ago

LAST FEW ISSUES:

image