Open AmandaDoyle opened 2 years ago
I had a question on how to calculate the variance of a fraction that I sent to Erica yesterday. In the Intro 1572-B data matrix - Current that I've been working off of in November there is a "denominator" column in the field specifications. I assumed this meant that indicators for which there is a denominator are fractions.
Erica said however that for the "Limited English Proficiency" (lep_19
) variable is not a fraction and the denominator just specified a "filter."
I still don't know how to calculate the variance of a fraction using replicate weights but I don't know if I will need to after all.
The most important thing I learned is that the data matrix is still a work in progress, I should focus on doing what I know how to do for sure and then fill in the gaps when it's finalized in mid December.
The two ACS PUMS variables for occupation I can find are INDP
(Industry recode for 2018 and later based on 2017 IND codes (INDP)) and OCCP
(Occupation recode for 2018 and later based on 2018 OCC codes (OCCP)). Both are much more granular than the categories listed in the "Employment by occupation" line in the "household economic security sheet" of the data matrix.
Is the idea that there will be a recode to map each occupation in INDP
or OCCP
the PUMS data to one of the categories in the sheet?
I'm hesitant to add variables to the ingestion pipeline before it's confirmed that I'm using the correct variables and we know how they will be used to create indicators. My current plan is to add PUMS variables that are definitely used for the household economic security indicators and wait on those that are not clear.
I have a question that may need a subject matter expert. For some data points we don't have any responses that fall into that category. In the testing data for example there are zero black non-hispanic respondents with a limited english proficiency in PUMA 3901 (southern staten island). The "count" measure for this is of course zero. It's also reported as a fraction of all limited english proficiency people ie the fraction of all people with limited english proficiency in southern staten island that are black non-hispanic. This should be zero too, the 0% of those new yorkers are black non-hispanic. But how do we report the variance of these data points? Is the margin of error and coefficient of variation N/A?
New Questions
[ ] In the housing security and quality category there are 3 indicators that are not required by legislation; therefore, we're not working on them. There are 5 indicators that are required by legislation but the design is not complete. SME is HPD or DHS.
[ ] In the housing economic security category there are 7 indicators that are required by legislation but the design is not complete. SME is HPD/DCP Population Division. There are 3 indicators that are not required by legislation; therefore, we're not working on them. Is "Poverty" required by legislation?
[ ] In the QOL and opportunity category there are 3 indicators that are high priority but the design is not complete. SME is unspecified.
[ ] For data points that need to be reported out as both a count and percent
nativity(1,2)
)?[ ] Standard error issue for median calculation for HVS and PUMS
[ ] In the process of trying to download the 2008-2012 5 Year ACS, we discovered that the Census GUI and API don't have the same PUMA level geography because of a change in the way PUMA areas were defined (see page 2 (https://www2.census.gov/programs-surveys/acs/tech_docs/pums/ACS2008_2012_PUMS_README.pdf)). Do you know how we should deal with this issue moving forward? Should we take the 2007-2011 5 year ACS data instead?
Once indicator list is finalized, what kind of data sources will we need for certain indicators (outside of ACS and HVS)?
Survey response rates and standard errors. I want to check this with a SME when we have a chance. I calculated the median age by race and checked my work in PUMA 4011. It's a PUMA that the 2019 5-year estimates put at 55% black non-hispanic, 31% white non-hispanic, 2.5% asian, 8.7% hispanic, and 3.0% "other" (none of the first 4 categories). I chose a PUMA in central brooklyn to check median ages by race because across that part of city there are many neighborhoods with young white gentrifiers and a longer-standing population of black residents. The median age by race does match this pattern. What surprised me is that standard error for median black non-hispanic age is 2.13 years, whereas it's less than one year for hispanics and white/asian non-hispanics. I realized this was because of survey response rates, and sure enough there are only 3.4 responses per 100 black residents compared to around 5 responses per 100 white/asian residents. That all passes the smell test. What gives me some pause is there are only 3.8 responses per 100 hispanic residents but half the standard error. Could it be a better age distribution among those who completed survey? Just want to make sure I fully understand what I'm doing, I don't think it's important enough to ping anyone at this moment
There are a group of 7 PUMAs for which se on median age is 0. The next lowest is .222. This doesn't pass smell test to me
Archived Questions
INDP
andOCCP
variables map to the categories listed in the "Employment by industry sector" and "Employment by occupation" indicators in the "household economic security" sheet?