Closed MicheleTobias closed 1 year ago
And to know which variables the domain team wants to look at.
The TidyCensus package for R can list the variables available for various geographies: https://www.ncdemography.org/2022/05/16/story-recipe-how-to-obtain-census-data-using-r-tidycensus/
CA Dept. of Finance has a nice explainer slide deck: https://dof.ca.gov/wp-content/uploads/Reports/Demographic_Reports/American_Community_Survey/documents/ACS_Variables_Nov2012.pdf
Variables of interest:
If you find a variable, please provide details in a comment. This way, I can get started on the analysis part as Alison and Sebastian find the variables.
Race variables for all census tract
all_fips <- c(1, 2, 4, 5, 6, 8, 9, 10, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55, 56, 72) #72 is PR
race_vars = c( all = "P2_001N", # All hisp = "P2_002N", # Hispanic white = "P2_005N", # White baa = "P2_006N", # Black or African American amin = "P2_007N", # American Indian asian = "P2_008N", # Asian nhopi = "P2_009N", # Native Hawaiian or Pacific Islander other = "P2_010N", # Some Other Race multi = "P2_011N" # Two or More Races )
race <- get_decennial( geography = 'tract', variables = race_vars, year = 2020, geometry = F, cache_table = TRUE, state = all_fips )
Hispanic or Latino variables
hisp_vars = c( hisp = "P2_002N", # Hispanic all = "P2_003N" # Not Hispanic )
hisp <- get_decennial( geography = 'tract', variables = hisp_vars, year = 2020, geometry = F, cache_table = TRUE, state = all_fips )
This is what I have so far, I'm not sure why when I put geometry = T
an error pops up. Also, I believe these are the only variables available right now for the 2020 Census on tidycensus but I could be wrong.
We just need to list the variable names for now
When you use get_decennial()
, I believe you're asking for the decennial census data (the once every 10 years data). get_acs()
is for the American Community Survey (ACS) data that the Census Bureau compiles yearly. We also don't need the geometries as long as the data includes the Tract ID.
I made a table to keep track of the variables, column names, and a human-readable description: census_variables.csv
Where would be getting the column name from? I have been finding the names from vars_acs, and each code has name, label, concept, and geography.
You get to make up a column name that is short but meaningful.
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Sebastian Lopez @.> Sent: Tuesday, December 13, 2022 4:21:55 PM To: datalab-dev/graves-endocrine_surgeons @.> Cc: Michele M Tobias @.>; Assign @.> Subject: Re: [datalab-dev/graves-endocrine_surgeons] Census Variables (Issue #7)
Where would be getting the column name from? I have been finding the names from vars_acs, and each code has name, label, concept, and geography.
— Reply to this email directly, view it on GitHubhttps://github.com/datalab-dev/graves-endocrine_surgeons/issues/7#issuecomment-1350153965, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACC3A4Y2NUBPFK6IR2N6ECDWNEHKHANCNFSM5ZVVHT5Q. You are receiving this because you were assigned.Message ID: @.***>
@alisonsnwong I see that you added the health insurance coverage. There is also an option to do it by age without the sex being included, those are the codes B27010_001-B27010_043 which have the option of doing the estimated total, do we know if the total still allow for geographical data? (it does have the tract type, but I am not sure how this would reflect on the actual data). If we want to prioritize having less variables weighting one another and making the variable list smaller, this is definitely an option to consider.
For the SES, I have been trying to find a variable with that information, the one I found is the poverty status, us that what we are looking for?
For the SES, I have been trying to find a variable with that information, the one I found is the poverty status, us that what we are looking for?
Another option is the poverty level but nativity of children, which has an index, but I assume that us restricted to families with children, so it might not be what we were looking for.
@alisonsnwong I see that you added the health insurance coverage. There is also an option to do it by age without the sex being included, those are the codes B27010_001-B27010_043 which have the option of doing the estimated total, do we know if the total still allow for geographical data? (it does have the tract type, but I am not sure how this would reflect on the actual data). If we want to prioritize having less variables weighting one another and making the variable list smaller, this is definitely an option to consider.
That's better in my opinion and will make the analysis easier, I will consider adding it to the file. I probably missed that option while looking for it!
For the SES variable, most sources online stated that the definition of SES is often measured as a combination of education, income, and occupation. We have the educational, income, and, employment status variable. Sebastian also added the poverty level and I feel like all these variables represent the SES variable well as a whole.
The transportation variables in the census_variables.csv
file the "category" column says the variable includes the sex of the people surveyed, but the description column doesn't seem to address that. Is this a sex-based metric or does the "category" column need to be updated?
Also, we need units for each of the variables.
The transportation variables in the
census_variables.csv
file the "category" column says the variable includes the sex of the people surveyed, but the description column doesn't seem to address that. Is this a sex-based metric or does the "category" column need to be updated?
A lot of these variables will be further divided by other variables (sex, number of children, poverty level, etc), but will also provide the sum. For transportation specifically, the category is named that way by the census, but we chose the total per category, meaning both male and female are included for each means of transportation.
Also, we need units for each of the variables.
To my understanding, each variable will output the number of people in that category per county. Can you clarify what units we should be including?
We need to clarify that we are using the sum if the census offers it in various divisions. The category
column wasn't intended to be a copy/paste from the census but rather it should coincide with the concept of the variable we identified in the meeting where we discussed variables - race, income, transportation...
We need units because not everything is people. Income is some kind of dollars, maybe in units of 10,000? Are the people variables straight up people or 100s of people? Maybe 1,000s of people? We need to know for sure.
Income might also be per month. I really don't know the units on that one.
@alisonsnwong it looks like the income you included is actually block data, you might want to check if that is messing with your code. If we want to stick to tract data only, it looks like we would have to look into individual income instead, since there is no option to do tract data with family or household income. I can fix this, just let me know what you think.
Good catch! Yes, we need to use tract data to match up with the previous analysis.
@erklopez if you found tract-level income variable, please replace that in the census variables table. @alisonsnwong your code should work with a new set of variables, I believe. You just need to swap in the list of variables.
I believe the geography
column shows the smallest geography at which a given variable is available. Since the block level is smaller than the tract level, the tract data is also available for the income (or any other variable that shows block group
in its geography column). Here's an article I found about it: https://walker-data.com/tidycensus/articles/basic-usage.html
@alisonsnwong so it sounds like the code you're working on already deals with this then?
@alisonsnwong so it sounds like the code you're working on already deals with this then?
Yeah. I think most of the variables we found should work as long as we specify geography = 'tract'
in the argument when using get_acs().
Fantastic teamwork! These are important questions to ask and make sure we've addressed, and I'm glad to hear we've got this covered already.
I think we're happy with the census variables so I'm going to close this issue. If there's something specific we need to address, let's open a new issue.
We need a list of census variables to chose from.