Census Variables - Githubissues

MicheleTobias commented 2 years ago

We need a list of census variables to chose from.

MicheleTobias commented 2 years ago

And to know which variables the domain team wants to look at.

MicheleTobias commented 2 years ago

The TidyCensus package for R can list the variables available for various geographies: https://www.ncdemography.org/2022/05/16/story-recipe-how-to-obtain-census-data-using-r-tidycensus/

MicheleTobias commented 2 years ago

CA Dept. of Finance has a nice explainer slide deck: https://dof.ca.gov/wp-content/uploads/Reports/Demographic_Reports/American_Community_Survey/documents/ACS_Variables_Nov2012.pdf

MicheleTobias commented 1 year ago

Variables of interest:

[x] Age
[x] Race
[x] Hispanic origin
[x] Health insurance
[x] Race
[x] Socio-economic Status (SES)
[x] Education
[x] Income
[x] Employment Status
[x] Gender/Sex
[x] Citizenship
[x] Means of transportation

MicheleTobias commented 1 year ago

If you find a variable, please provide details in a comment. This way, I can get started on the analysis part as Alison and Sebastian find the variables.

alisonsnwong commented 1 year ago

Race variables for all census tract

all_fips <- c(1, 2, 4, 5, 6, 8, 9, 10, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55, 56, 72) #72 is PR

race_vars = c( all = "P2_001N", # All hisp = "P2_002N", # Hispanic white = "P2_005N", # White baa = "P2_006N", # Black or African American amin = "P2_007N", # American Indian asian = "P2_008N", # Asian nhopi = "P2_009N", # Native Hawaiian or Pacific Islander other = "P2_010N", # Some Other Race multi = "P2_011N" # Two or More Races )

race <- get_decennial( geography = 'tract', variables = race_vars, year = 2020, geometry = F, cache_table = TRUE, state = all_fips )

alisonsnwong commented 1 year ago

Hispanic or Latino variables

hisp_vars = c( hisp = "P2_002N", # Hispanic all = "P2_003N" # Not Hispanic )

hisp <- get_decennial( geography = 'tract', variables = hisp_vars, year = 2020, geometry = F, cache_table = TRUE, state = all_fips )

alisonsnwong commented 1 year ago

This is what I have so far, I'm not sure why when I put geometry = T an error pops up. Also, I believe these are the only variables available right now for the 2020 Census on tidycensus but I could be wrong.

MicheleTobias commented 1 year ago

We just need to list the variable names for now

MicheleTobias commented 1 year ago

When you use get_decennial(), I believe you're asking for the decennial census data (the once every 10 years data). get_acs() is for the American Community Survey (ACS) data that the Census Bureau compiles yearly. We also don't need the geometries as long as the data includes the Tract ID.

MicheleTobias commented 1 year ago

I made a table to keep track of the variables, column names, and a human-readable description: census_variables.csv

erklopez commented 1 year ago

Where would be getting the column name from? I have been finding the names from vars_acs, and each code has name, label, concept, and geography.

MicheleTobias commented 1 year ago

You get to make up a column name that is short but meaningful.

Get Outlook for Androidhttps://aka.ms/AAb9ysg

From: Sebastian Lopez @.> Sent: Tuesday, December 13, 2022 4:21:55 PM To: datalab-dev/graves-endocrine_surgeons @.> Cc: Michele M Tobias @.>; Assign @.> Subject: Re: [datalab-dev/graves-endocrine_surgeons] Census Variables (Issue #7)

Where would be getting the column name from? I have been finding the names from vars_acs, and each code has name, label, concept, and geography.

— Reply to this email directly, view it on GitHubhttps://github.com/datalab-dev/graves-endocrine_surgeons/issues/7#issuecomment-1350153965, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACC3A4Y2NUBPFK6IR2N6ECDWNEHKHANCNFSM5ZVVHT5Q. You are receiving this because you were assigned.Message ID: @.***>

erklopez commented 1 year ago

@alisonsnwong I see that you added the health insurance coverage. There is also an option to do it by age without the sex being included, those are the codes B27010_001-B27010_043 which have the option of doing the estimated total, do we know if the total still allow for geographical data? (it does have the tract type, but I am not sure how this would reflect on the actual data). If we want to prioritize having less variables weighting one another and making the variable list smaller, this is definitely an option to consider.

erklopez commented 1 year ago

For the SES, I have been trying to find a variable with that information, the one I found is the poverty status, us that what we are looking for?

erklopez commented 1 year ago

For the SES, I have been trying to find a variable with that information, the one I found is the poverty status, us that what we are looking for?

Another option is the poverty level but nativity of children, which has an index, but I assume that us restricted to families with children, so it might not be what we were looking for.

alisonsnwong commented 1 year ago

@alisonsnwong I see that you added the health insurance coverage. There is also an option to do it by age without the sex being included, those are the codes B27010_001-B27010_043 which have the option of doing the estimated total, do we know if the total still allow for geographical data? (it does have the tract type, but I am not sure how this would reflect on the actual data). If we want to prioritize having less variables weighting one another and making the variable list smaller, this is definitely an option to consider.

That's better in my opinion and will make the analysis easier, I will consider adding it to the file. I probably missed that option while looking for it!

alisonsnwong commented 1 year ago

For the SES variable, most sources online stated that the definition of SES is often measured as a combination of education, income, and occupation. We have the educational, income, and, employment status variable. Sebastian also added the poverty level and I feel like all these variables represent the SES variable well as a whole.

MicheleTobias commented 1 year ago

The transportation variables in the census_variables.csv file the "category" column says the variable includes the sex of the people surveyed, but the description column doesn't seem to address that. Is this a sex-based metric or does the "category" column need to be updated?

MicheleTobias commented 1 year ago

Also, we need units for each of the variables.

erklopez commented 1 year ago

The transportation variables in the census_variables.csv file the "category" column says the variable includes the sex of the people surveyed, but the description column doesn't seem to address that. Is this a sex-based metric or does the "category" column need to be updated?

A lot of these variables will be further divided by other variables (sex, number of children, poverty level, etc), but will also provide the sum. For transportation specifically, the category is named that way by the census, but we chose the total per category, meaning both male and female are included for each means of transportation.

erklopez commented 1 year ago

Also, we need units for each of the variables.

To my understanding, each variable will output the number of people in that category per county. Can you clarify what units we should be including?

MicheleTobias commented 1 year ago

We need to clarify that we are using the sum if the census offers it in various divisions. The category column wasn't intended to be a copy/paste from the census but rather it should coincide with the concept of the variable we identified in the meeting where we discussed variables - race, income, transportation...

We need units because not everything is people. Income is some kind of dollars, maybe in units of 10,000? Are the people variables straight up people or 100s of people? Maybe 1,000s of people? We need to know for sure.

MicheleTobias commented 1 year ago

Income might also be per month. I really don't know the units on that one.

erklopez commented 1 year ago

@alisonsnwong it looks like the income you included is actually block data, you might want to check if that is messing with your code. If we want to stick to tract data only, it looks like we would have to look into individual income instead, since there is no option to do tract data with family or household income. I can fix this, just let me know what you think.

MicheleTobias commented 1 year ago

Good catch! Yes, we need to use tract data to match up with the previous analysis.

MicheleTobias commented 1 year ago

@erklopez if you found tract-level income variable, please replace that in the census variables table. @alisonsnwong your code should work with a new set of variables, I believe. You just need to swap in the list of variables.

alisonsnwong commented 1 year ago

I believe the geography column shows the smallest geography at which a given variable is available. Since the block level is smaller than the tract level, the tract data is also available for the income (or any other variable that shows block group in its geography column). Here's an article I found about it: https://walker-data.com/tidycensus/articles/basic-usage.html

MicheleTobias commented 1 year ago

@alisonsnwong so it sounds like the code you're working on already deals with this then?

alisonsnwong commented 1 year ago

@alisonsnwong so it sounds like the code you're working on already deals with this then?

Yeah. I think most of the variables we found should work as long as we specify geography = 'tract' in the argument when using get_acs().

MicheleTobias commented 1 year ago

Fantastic teamwork! These are important questions to ask and make sure we've addressed, and I'm glad to hear we've got this covered already.

MicheleTobias commented 1 year ago

I think we're happy with the census variables so I'm going to close this issue. If there's something specific we need to address, let's open a new issue.

datalab-dev / graves-endocrine_surgeons

Census Variables #7