AmandaDoyle commented 2 years ago

New Questions

[ ] In the housing security and quality category there are 3 indicators that are not required by legislation; therefore, we're not working on them. There are 5 indicators that are required by legislation but the design is not complete. SME is HPD or DHS.
[ ] In the housing economic security category there are 7 indicators that are required by legislation but the design is not complete. SME is HPD/DCP Population Division. There are 3 indicators that are not required by legislation; therefore, we're not working on them. Is "Poverty" required by legislation?
[ ] In the QOL and opportunity category there are 3 indicators that are high priority but the design is not complete. SME is unspecified.
[ ] For data points that need to be reported out as both a count and percent
1. Who is responsible for calculating the percent? EDM or Digital Services?
2. If there is a value in the Denominator column does that mean that the data point needs to be reported out as a count and a percent? If so, the field specified in the Denominator column in the Field specifications tab acts as the denominator value for the calculation, correct? Can some values in the Denominator column be simplified to total population (i.e. nativity(1,2))?
3. How do we capture this information for all indicators?
[ ] Standard error issue for median calculation for HVS and PUMS

Waiting to hear back from Census / HPD
[ ] In the process of trying to download the 2008-2012 5 Year ACS, we discovered that the Census GUI and API don't have the same PUMA level geography because of a change in the way PUMA areas were defined (see page 2 (https://www2.census.gov/programs-surveys/acs/tech_docs/pums/ACS2008_2012_PUMS_README.pdf)). Do you know how we should deal with this issue moving forward? Should we take the 2007-2011 5 year ACS data instead?

Other data Questions

Once indicator list is finalized, what kind of data sources will we need for certain indicators (outside of ACS and HVS)?
- And how will can we build infrastructure to handle future/updated releases of EDTT + data?
- How often is the data released
Survey response rates and standard errors. I want to check this with a SME when we have a chance. I calculated the median age by race and checked my work in PUMA 4011. It's a PUMA that the 2019 5-year estimates put at 55% black non-hispanic, 31% white non-hispanic, 2.5% asian, 8.7% hispanic, and 3.0% "other" (none of the first 4 categories). I chose a PUMA in central brooklyn to check median ages by race because across that part of city there are many neighborhoods with young white gentrifiers and a longer-standing population of black residents. The median age by race does match this pattern. What surprised me is that standard error for median black non-hispanic age is 2.13 years, whereas it's less than one year for hispanics and white/asian non-hispanics. I realized this was because of survey response rates, and sure enough there are only 3.4 responses per 100 black residents compared to around 5 responses per 100 white/asian residents. That all passes the smell test. What gives me some pause is there are only 3.8 responses per 100 hispanic residents but half the standard error. Could it be a better age distribution among those who completed survey? Just want to make sure I fully understand what I'm doing, I don't think it's important enough to ping anyone at this moment
There are a group of 7 PUMAs for which se on median age is 0. The next lowest is .222. This doesn't pass smell test to me

Archived Questions
- [x] How do ACS PUMS INDP and OCCP variables map to the categories listed in the "Employment by industry sector" and "Employment by occupation" indicators in the "household economic security" sheet?
  
  See 2015-2019 ACS 5-year PUMS Code Lists here

SashaWeinstein commented 2 years ago

I had a question on how to calculate the variance of a fraction that I sent to Erica yesterday. In the Intro 1572-B data matrix - Current that I've been working off of in November there is a "denominator" column in the field specifications. I assumed this meant that indicators for which there is a denominator are fractions.

Erica said however that for the "Limited English Proficiency" (lep_19) variable is not a fraction and the denominator just specified a "filter."

I still don't know how to calculate the variance of a fraction using replicate weights but I don't know if I will need to after all.

The most important thing I learned is that the data matrix is still a work in progress, I should focus on doing what I know how to do for sure and then fill in the gaps when it's finalized in mid December.

SashaWeinstein commented 2 years ago

Employment by occupation, industry sector

The two ACS PUMS variables for occupation I can find are INDP (Industry recode for 2018 and later based on 2017 IND codes (INDP)) and OCCP (Occupation recode for 2018 and later based on 2018 OCC codes (OCCP)). Both are much more granular than the categories listed in the "Employment by occupation" line in the "household economic security sheet" of the data matrix.

Is the idea that there will be a recode to map each occupation in INDP or OCCP the PUMS data to one of the categories in the sheet?

I'm hesitant to add variables to the ingestion pipeline before it's confirmed that I'm using the correct variables and we know how they will be used to create indicators. My current plan is to add PUMS variables that are definitely used for the household economic security indicators and wait on those that are not clear.

SashaWeinstein commented 2 years ago

I have a question that may need a subject matter expert. For some data points we don't have any responses that fall into that category. In the testing data for example there are zero black non-hispanic respondents with a limited english proficiency in PUMA 3901 (southern staten island). The "count" measure for this is of course zero. It's also reported as a fraction of all limited english proficiency people ie the fraction of all people with limited english proficiency in southern staten island that are black non-hispanic. This should be zero too, the 0% of those new yorkers are black non-hispanic. But how do we report the variance of these data points? Is the margin of error and coefficient of variation N/A?

NYCPlanning / db-equitable-development-tool

Questions #14

New Questions

Archived Questions

Employment by occupation, industry sector