Census variables don't match

JasonSills commented 4 years ago

Hi @lecy,

I'm looking through the census data using VarSearch <- load_variables( 2017, "acs5", cache=TRUE ) VarSearch$label <- toupper(VarSearch$label) view(VarSearch)

I noticed in the lectures the variables that are used don't match the variables in the dataset. For example, income in the lecture is used as: "B19013_001E", "B19013_001M". However, these are not in the census data. There is a B19013E_001 and nothing with an M. How do I know which variable to use?

lecy commented 4 years ago

I'm not sure if I fully understand your question, but if it is about the difference between E and M variables.

E stands for "estimate" or the actual value to use.

M stands for "margin of error" or basically the standard error of the tract average being reported.

https://api.census.gov/data/2017/acs/acs5/groups/B19013.html

Note that the smaller the slice of data used to create the estimate (for example income for an entire tract versus income for a specific race + age group within the tract) the larger the margin of error.

We are not using error margins for this course, but if you were doing an advanced econometrics class using census data you would bootstrap models using multiple draws of simulated data created using the margins, not unlike how the website 538 creates election forecasts by aggregating polling data.

https://projects.fivethirtyeight.com/2020-election-forecast/

lecy commented 4 years ago

There is a B19013E_001 and nothing with an M. How do I know which variable to use?

If you pull that variable, does it generate an E and M version for you through the API call?

JasonSills commented 4 years ago

Hi @lecy

Perhaps this will clear up what I mean. In the lecture notes we have this code below. However, I cannot find these variables in census data.

I tried the B19013E_001 variable, but NA came through for the estimate. I switched to B19013_001. It works, I just want to make sure I'm pulling the data correctly and that I'm understanding what's going on here. I understand that e on the end is for estimate, but I'm not seeing these variables in the census pull. I would love to learn how to use these because I do want to work with estimates.

lecy commented 4 years ago

I tried the B19013E_001

Should that be B19013_001E ?

JasonSills commented 4 years ago

There is B19013E_001 and I tried it, but the values were NA. So B19013_001E doesn't seem to exist, even though it was used in the lecture. B19013E_001 does exist, but it's not giving me values. B19013_001 is what I'm currently using.

lecy commented 4 years ago

@Anthony-Howell-PhD do you know anything about this?

Did the Census change variable names in the API recently?

AntJam-Howell commented 4 years ago

It looks like there was a change, although if you call in B19013_001E using the get_acs call, it will still grab the income data and automatically drops the E from the variable name. Not sure why GetCensus shoots out an error. In either case, drop the E from the variable name to obtain the income variable.

library(tidycensus)

census_key <- "xxxxxxxxxxxxxxxxxx"
census_api_key(census_key)

vt <- get_acs(geography = "county",
              variables = c("B19013_001E"),
              state = "VT",
              year = 2018)

vt

AntJam-Howell commented 4 years ago

library(tidycensus)

census_key <- "xxxxxxxxxxxxxxxx"
census_api_key(census_key)

vt <- get_acs(geography = "county", variables = c("B19013_001E"), state = "VT", year = 2018)

vt

ekmcintyre commented 4 years ago

I assumed the variables we use in the lab were supposed to be different from the lecture notes. I used the first row variable name after searching "median value" and "median income" in the data set and got "B25077_001" and "B19013_001" for those respectively. I have completed all the coding and answered the questions at this point. Did I use the incorrect variables?

lecy commented 4 years ago

@ekmcintyre Looks good - those variables are correct.

The question above was about changes to the Census API name convention generally, so the variable was from the lecture notes, not the lab.

DS4PS / cpp-529-fall-2020

Census variables don't match #2