NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

Questions #14

Open AmandaDoyle opened 2 years ago

AmandaDoyle commented 2 years ago

New Questions

Other data Questions

  1. Once indicator list is finalized, what kind of data sources will we need for certain indicators (outside of ACS and HVS)?

    • And how will can we build infrastructure to handle future/updated releases of EDTT + data?
    • How often is the data released
  2. Survey response rates and standard errors. I want to check this with a SME when we have a chance. I calculated the median age by race and checked my work in PUMA 4011. It's a PUMA that the 2019 5-year estimates put at 55% black non-hispanic, 31% white non-hispanic, 2.5% asian, 8.7% hispanic, and 3.0% "other" (none of the first 4 categories). I chose a PUMA in central brooklyn to check median ages by race because across that part of city there are many neighborhoods with young white gentrifiers and a longer-standing population of black residents. The median age by race does match this pattern. What surprised me is that standard error for median black non-hispanic age is 2.13 years, whereas it's less than one year for hispanics and white/asian non-hispanics. I realized this was because of survey response rates, and sure enough there are only 3.4 responses per 100 black residents compared to around 5 responses per 100 white/asian residents. That all passes the smell test. What gives me some pause is there are only 3.8 responses per 100 hispanic residents but half the standard error. Could it be a better age distribution among those who completed survey? Just want to make sure I fully understand what I'm doing, I don't think it's important enough to ping anyone at this moment

  3. There are a group of 7 PUMAs for which se on median age is 0. The next lowest is .222. This doesn't pass smell test to me

    Archived Questions

    • [x] How do ACS PUMS INDP and OCCP variables map to the categories listed in the "Employment by industry sector" and "Employment by occupation" indicators in the "household economic security" sheet?

      See 2015-2019 ACS 5-year PUMS Code Lists here

SashaWeinstein commented 2 years ago

I had a question on how to calculate the variance of a fraction that I sent to Erica yesterday. In the Intro 1572-B data matrix - Current that I've been working off of in November there is a "denominator" column in the field specifications. I assumed this meant that indicators for which there is a denominator are fractions.

Erica said however that for the "Limited English Proficiency" (lep_19) variable is not a fraction and the denominator just specified a "filter."

I still don't know how to calculate the variance of a fraction using replicate weights but I don't know if I will need to after all.

The most important thing I learned is that the data matrix is still a work in progress, I should focus on doing what I know how to do for sure and then fill in the gaps when it's finalized in mid December.

SashaWeinstein commented 2 years ago

Employment by occupation, industry sector

The two ACS PUMS variables for occupation I can find are INDP (Industry recode for 2018 and later based on 2017 IND codes (INDP)) and OCCP (Occupation recode for 2018 and later based on 2018 OCC codes (OCCP)). Both are much more granular than the categories listed in the "Employment by occupation" line in the "household economic security sheet" of the data matrix.

Is the idea that there will be a recode to map each occupation in INDP or OCCP the PUMS data to one of the categories in the sheet?

I'm hesitant to add variables to the ingestion pipeline before it's confirmed that I'm using the correct variables and we know how they will be used to create indicators. My current plan is to add PUMS variables that are definitely used for the household economic security indicators and wait on those that are not clear.

SashaWeinstein commented 2 years ago

I have a question that may need a subject matter expert. For some data points we don't have any responses that fall into that category. In the testing data for example there are zero black non-hispanic respondents with a limited english proficiency in PUMA 3901 (southern staten island). The "count" measure for this is of course zero. It's also reported as a fraction of all limited english proficiency people ie the fraction of all people with limited english proficiency in southern staten island that are black non-hispanic. This should be zero too, the 0% of those new yorkers are black non-hispanic. But how do we report the variance of these data points? Is the margin of error and coefficient of variation N/A?