NYCPlanning / db-factfinder

data ETL for population fact finder (decennial + acs)
https://nycplanning.github.io/db-factfinder/factfinder/
MIT License
2 stars 3 forks source link

Inconsistencies -> PM PE for profile only variables #56

Closed SPTKL closed 3 years ago

SPTKL commented 3 years ago

Solution:

After discussing with populations, there are certain variables that doesn't follow the rules below. There is a list of variables that:

  1. has PE/PM provided by the census bureau
  2. but we still calculate the PE?PM against a different base (compare to the census bureau)

Note: this list of variables don't change over time, so we should try to create this list by observation

Issue:

For profile only variables (DP) on a non-aggregated geographic level, PE(P) and PM(Z) comes from the API Currently for the example of tract level variable cvlfuem2, the PE, PM are taken directly from the census API image

However, based on observations from the database (the edm version y2014-2018 and the population version y2014-2018-erica) are using the calculated PE and PM. It seems like issues like this persists in many other variables. (variables prepended with _ are from the last release, variables without _ are produced using pff-factfinder)

image

SPTKL commented 3 years ago

Special cases:

select
    distinct pff_variable
from (
select 
    a.census_geoid,
    a.geoid,
    a.pff_variable,
    a.c, a.e, a.m, a.p, a.z,
    b.c as _c, b.e as _e, b.m as _m, b.p as _p, b.z as _z
from (
    select *, (CASE WHEN LEFT(a.census_geoid, 5) = '36005' THEN 2
            WHEN LEFT(a.census_geoid, 5) = '36047' THEN 3
            WHEN LEFT(a.census_geoid, 5) = '36061' THEN 1
            WHEN LEFT(a.census_geoid, 5) = '36081' THEN 4
            WHEN LEFT(a.census_geoid, 5) = '36085' THEN 5
    END)||RIGHT(a.census_geoid, 6) as geoid
    from pff_acs."2018-test" a
    where a.geotype = 'tract'
) a
join pff_acs."2018" b
on a.pff_variable = lower(b.variable)
and a.geoid = b.geoid
and b.geotype = 'CT2010'
) a
where e = _e and m = _m and (abs(p - _p) > 0.1 or abs(z - _z) > 0.1)
order by pff_variable;

abroad 1545 cvlfuem2 1792 dfhsdfcnt 1998 dfhssmcnt 2091 dfhsus 2110 hh5 11 oochu4 6 p65plbwpv 608 pbwpv 189 pu18bwpv 804

SPTKL commented 3 years ago

confirm that none of these variables are used by community profiles, we will not fix the package at this moment, and will fix in mid Jan when we get to work closely with populations division

SPTKL commented 3 years ago

this is still an issue

SPTKL commented 3 years ago

Mostly resolved differences in P, but still have issues in Z calculation looking into cvniu18d

{
        "pff_variable": "cvniu18d",
        "base_variable": "cvniu18_1",
        "census_variable": [
            "DP02_0073"
        ],
        "domain": "social",
        "rounding": 0,
        "source": "profile"
    },

image

previous: image

current: image

image

SPTKL commented 3 years ago

mostly resolved except p65plbwpv, and pu18bwpv tracked in #72