datalab-dev / graves-endocrine_surgeons

Claire Graves' 2022 collaboration to identify access to endocrine surgery centers
GNU General Public License v3.0
1 stars 1 forks source link

Rurality Index #18

Open MicheleTobias opened 1 year ago

MicheleTobias commented 1 year ago

Another variable to incorporate after we complete the census data analysis is a rurality index. We need to:

MicheleTobias commented 1 year ago

Some Options:

USDA Economic Research Service Rural Classifications - classifies counties into metro (metropolitan) and non-metro. The trouble with this is that the west coast has large counties, so all of San Bernardino, for example, is "metro", which really isn't true.

Purdue's Index of Relative Rurality (IRR) 2000 & 2010 - doesn't really fit our analysis timeframe. Also by county.

The US Census might have classified their geographies as rural or not. --> can we get this information out of Tidy Census?

US Census Bureau Urban and Rural Areas - essentially, the TIGER shapefiles have the urban areas delineated. This is our best bet, right now and will follow the logic of our current plan.

MicheleTobias commented 1 year ago

Paper: Which Definition of Rurality Should I Use? Medical Care: 59():p S413-S419, October 2021 - compares different rurality criteria and how different they are from each other.

MicheleTobias commented 1 year ago

The Census' Urban Areas data is based on population density.

"For the 2020 Census, an urban area will comprise a densely settled core of census blocks that meet minimum housing unit density and/or population density requirements. This includes adjacent territory containing non-residential urban land uses. To qualify as an urban area, the territory identified according to criteria must encompass at least 2,000 housing units or a population of at least 5,000."

The result is polygons of urban areas. The boundary cuts across tracts (presumably because they were based on blocks). See the screenshot of Pahrump, NV as an example (yellow is census tracts, pink is urban areas). Screenshot 2023-02-03 160155

MicheleTobias commented 1 year ago

The Federal Register documents the exact criteria used to delineate the Census' urban areas classification.

Also, our timing is good for this variable. This data wasn't available until December 2022 it seems.

MicheleTobias commented 1 year ago

The census' Urban Areas really only defines "urban areas" and "not urban areas". Those "not urban areas" include both rural and suburban and other classes in between. So any questions we ask using this dataset, really only answer questions about who is in urban areas, not rural areas specifically.

MicheleTobias commented 1 year ago

Federal Office of Rurality Data is based on census tracts and modifies the USDA ERS Rural-Urban Commuting Area Codes (RUCA) to handle larger tracts better. The file Non-Metro Counties (Micropolitan and non-core based counties) and Eligible Census Tracts in Metropolitan Counties looks like what we want.

It's based on the 2010 RUCA codes, so it uses the 2010 Census.

It includes the US and PR.

MicheleTobias commented 1 year ago

CDC's National Center for Health Statistics (NCHS) Urban-Rural Classification Scheme for Counties is, predictably, based on counties and doesn't seem to have sub-divisions. 2013 data is currently available with the 2023 update coming this month some time.

MicheleTobias commented 1 year ago

FiveThirtyEight Arcile: How Urban Or Rural Is Your State? And What Does That Mean For The 2020 Election? has a discussion of rurality indexes used in predicting the 2020 election.

FiveThirtyEight's metric uses census tracts to classify larger areas and claims to be an indication of how many neighbors people have.

MicheleTobias commented 1 year ago

The preprint, A fine-grained, versatile index of remoteness to characterize place-level rurality, has a good discussion of some of the indexes listed above and their limitations. This paper proposes a place-based metric, that looks visually good, but I think the underlying calculations might not represent what we want - they use point-based metrics and then apply them to areas and don't seem to use any normalizing (density) for larger and smaller areas.

BUT! It makes me think that the HRSA data is a good choice, except that it is for 2010.

MicheleTobias commented 1 year ago

The VA uses the RUCA data definition of rurality.

MicheleTobias commented 1 year ago

The Federal Office of Rurality Data looks reasonable on a map, but I need to investigate why some parts of Alaska are not rural and some big tracts in the lower 48 just to be sure the code is doing what I think it's doing.

MicheleTobias commented 1 year ago

I found the mistake in my code and now the results look correct. image

MicheleTobias commented 1 year ago

It looks like there's topological errors in the 2010 tract data so there are slivers between tracts that should touch, especially along state boundaries. This is making invalid polygons when we do the intersection between the isochrones.

BUT! Now that I'm thinking about it... since the isochrone tract intersection all has the tract geoid, why didn't I just assign a rurality classification based on that? It would run so much faster because we wouldn't need to do a spatial process in R.

MicheleTobias commented 1 year ago

Oh, right! Because we're working with 2010 tracts for the rurality designation and 2020 for the rest of the analysis.

MicheleTobias commented 1 year ago

Maybe doing an intersection would be sufficient?

MicheleTobias commented 1 year ago

The terra package looks like it can identify and remove gaps? https://search.r-project.org/CRAN/refmans/terra/html/gaps.html It's hard to tell if it removes holes from inside polygons or the spaces between polygons.

MicheleTobias commented 1 year ago

Maybe use gaps and snap together or just snap by itself.

https://rspatial.github.io/terra/reference/terra-package.html

MicheleTobias commented 1 year ago

The missing census tracts seem to be tracts that crossed the boundary of the non-rural polygons, and only the piece that should have been inside the non-rural boundary, so something seems to have gone awry with the st_intersection.