Open MicheleTobias opened 1 year ago
Some Options:
USDA Economic Research Service Rural Classifications - classifies counties into metro (metropolitan) and non-metro. The trouble with this is that the west coast has large counties, so all of San Bernardino, for example, is "metro", which really isn't true.
Purdue's Index of Relative Rurality (IRR) 2000 & 2010 - doesn't really fit our analysis timeframe. Also by county.
The US Census might have classified their geographies as rural or not. --> can we get this information out of Tidy Census?
US Census Bureau Urban and Rural Areas - essentially, the TIGER shapefiles have the urban areas delineated. This is our best bet, right now and will follow the logic of our current plan.
Paper: Which Definition of Rurality Should I Use? Medical Care: 59():p S413-S419, October 2021 - compares different rurality criteria and how different they are from each other.
The Census' Urban Areas data is based on population density.
"For the 2020 Census, an urban area will comprise a densely settled core of census blocks that meet minimum housing unit density and/or population density requirements. This includes adjacent territory containing non-residential urban land uses. To qualify as an urban area, the territory identified according to criteria must encompass at least 2,000 housing units or a population of at least 5,000."
The result is polygons of urban areas. The boundary cuts across tracts (presumably because they were based on blocks). See the screenshot of Pahrump, NV as an example (yellow is census tracts, pink is urban areas).
The Federal Register documents the exact criteria used to delineate the Census' urban areas classification.
Also, our timing is good for this variable. This data wasn't available until December 2022 it seems.
The census' Urban Areas really only defines "urban areas" and "not urban areas". Those "not urban areas" include both rural and suburban and other classes in between. So any questions we ask using this dataset, really only answer questions about who is in urban areas, not rural areas specifically.
Federal Office of Rurality Data is based on census tracts and modifies the USDA ERS Rural-Urban Commuting Area Codes (RUCA) to handle larger tracts better. The file Non-Metro Counties (Micropolitan and non-core based counties) and Eligible Census Tracts in Metropolitan Counties
looks like what we want.
It's based on the 2010 RUCA codes, so it uses the 2010 Census.
It includes the US and PR.
CDC's National Center for Health Statistics (NCHS) Urban-Rural Classification Scheme for Counties is, predictably, based on counties and doesn't seem to have sub-divisions. 2013 data is currently available with the 2023 update coming this month some time.
FiveThirtyEight Arcile: How Urban Or Rural Is Your State? And What Does That Mean For The 2020 Election? has a discussion of rurality indexes used in predicting the 2020 election.
FiveThirtyEight's metric uses census tracts to classify larger areas and claims to be an indication of how many neighbors people have.
The preprint, A fine-grained, versatile index of remoteness to characterize place-level rurality, has a good discussion of some of the indexes listed above and their limitations. This paper proposes a place-based metric, that looks visually good, but I think the underlying calculations might not represent what we want - they use point-based metrics and then apply them to areas and don't seem to use any normalizing (density) for larger and smaller areas.
BUT! It makes me think that the HRSA data is a good choice, except that it is for 2010.
The VA uses the RUCA data definition of rurality.
The Federal Office of Rurality Data looks reasonable on a map, but I need to investigate why some parts of Alaska are not rural and some big tracts in the lower 48 just to be sure the code is doing what I think it's doing.
I found the mistake in my code and now the results look correct.
It looks like there's topological errors in the 2010 tract data so there are slivers between tracts that should touch, especially along state boundaries. This is making invalid polygons when we do the intersection between the isochrones.
BUT! Now that I'm thinking about it... since the isochrone tract intersection all has the tract geoid, why didn't I just assign a rurality classification based on that? It would run so much faster because we wouldn't need to do a spatial process in R.
Oh, right! Because we're working with 2010 tracts for the rurality designation and 2020 for the rest of the analysis.
Maybe doing an intersection would be sufficient?
The terra package looks like it can identify and remove gaps? https://search.r-project.org/CRAN/refmans/terra/html/gaps.html It's hard to tell if it removes holes from inside polygons or the spaces between polygons.
Maybe use gaps and snap together or just snap by itself.
https://rspatial.github.io/terra/reference/terra-package.html
The missing census tracts seem to be tracts that crossed the boundary of the non-rural polygons, and only the piece that should have been inside the non-rural boundary, so something seems to have gone awry with the st_intersection.
Another variable to incorporate after we complete the census data analysis is a rurality index. We need to: