Closed kaseyzapatka closed 11 months ago
mutual_local
computes local segregation scores (see here or this paper for reference). If you simply want an H index for each county, you need counts by race and tract (or another spatial unit), and then use mutual_total
, which will give you the H index.
Hi @elbersb,
Thanks for responding! Used this package a few times and it's just so great/comprehensive. Thanks for making and maintaining.
I had seen that documentation you mentioned, but couldn't figure out how to use mutual_total
to return H and M scores for every county across the country. I realized that I wanted to use the group_modify
workflow you mention in the documentation, but kept getting the error about how my "group variable is constant"
. The answer was to filter out census tracts where they there was only 1 census tract per county. When I merge those back with the national county-level dataset, those counties will just be missing.
Including my code incase it is helpful for others with trying to calculate H or M for counties using tract data across the country.
data %>%
# group by county
group_by(county_fips) %>%
# filter out where there is only one tract per county (which means no variation for calculations)
mutate(count = n()) %>%
filter(count > 5) %>% # adjust based on count of categories
# group modify
group_modify(~mutual_total(data = .x,
group = "race",
unit = "tract_fips",
weight = "n")) %>%
glimpse()
Thanks!
Hi @elbersb,
I'm working on a project and we are using your segregation package to calculate a few segregation indices. Ultimately, my PI and I want to calculate H (Theil's index) using tract data within counties for 2021 5year ACS data so we have a segregation score for every county. This means my datafame is at the census tract level, nested within counties. My and my PI's understanding is to calculate H (Theil's index) for every county we need race data for every tract (maybe this is wrong?) Again, we ultimately want a measure of within county segregation.
Using the
mutual_local
function, I had two thoughts:Option 1: My thought was to specify tract as the unit grouped by county, but this returns a score for every tract as shown below, which is too many.
Option 2: Alternatively, using the same data , we can specify county as the unit level and it returns a measure for every county. I find the same results using county-level data or tracts within county, which gives me pause the tract-level information is being used in calculations.
My question is which option gives me what I want (H score for every county)? I'm not confident the option 2 is using the tract-level data but option 1 returns a score for every tract, which is too granular. Should we use option 1 and average over tracts grouped by county somehow (this seems wrong). Or use option 2 and understand the function is using the tract data in the county-level calculations (this seems dubious because i get the same scores using tract or county-level data).
Also, I realize
mutal_local
calculates M, not H. Am I correct in understanding that H is just a normalized version of M? How is this best explained to my research team as to why we are using M and not H?Thanks for your help. Best, Kasey