Closed kaseyzapatka closed 3 years ago
You're roughly on the right track. Your map_df
approach works if you do the following:
cbsa <- c(35620, 37980)
ls_by_CBSA <- map_df(cbsa, function(x) {
NYC_Philly_2019 %>%
dplyr::filter(CBSA == x) %>%
mutual_local(data = .,
group = "race",
unit = "tractid",
weight = "n",
wide = TRUE)
})
But then you don't have the cbsa codes in the data. So what I'd do is this:
NYC_Philly_2019 %>%
group_by(CBSA) %>%
group_modify(~mutual_local(data = .x,
group = "race",
unit = "tractid",
weight = "n",
wide = TRUE) )
This is the simplest approach, and you can group_by
year and other characteristics as well.
Let me know if that works!
Thanks, @elbersb. Your code worked perfectly and I was able incorporate group_modify
into my general workflow, which made it much easier to get various segregation estimates from your segregation
package with my data structure. Thanks for the coding tip!
One final question, is it possible to decompose differences over time by CBSA so I have a decomposition for each CBSA?
I don't think the group_modify
approach works in this case because mutual_difference
requires two inputs and I don't know how to pipe two inputs into the function. The map2_df
approach worked (code below), but
Any thoughts? I can always do it manually but then I'll be choosy about which CBSAs I run the decomposition for. Thanks again!
Sample 2000 and 2019 data if helpful.
# test mapping function over both CBSAs ----------------------------------------
# trying to be fancy and split the dataset and loop over resulting list
# by_information_data <- information_data %>%
# filter(year != 2013) %>%
# split(.$year) %>%
test <- map2_dfr(NYC_Philly_2019, NYC_Philly_2019, ~ mutual_difference(data1 = NYC_Philly_2000,
data2 = NYC_Philly_2019,
group = "race",
unit = "tractid",
weight = "n",
method = "shapley"))
print(test)
View(test)
group_modify
takes any function, so the easiest is to put the two years together into one dataset (via bind_rows
or so), and then have:
diff <- function(df, group) {
y1 <- filter(df, year == 2000)
y2 <- filter(df, year == 2005)
mutual_difference(y1, y2, group = "race", unit = "tract", weight = "n")
}
data %>%
group_by(cbsa) %>%
group_modify(diff)
I haven't tested this exact code, but I've used this pattern often in my own work.
Thanks @elbersb, worked perfectly. That was so easy/obvious once I saw it in action. Thanks for all your help with learning this package. I really like it, it's versatility, and what this approach will allow me to do in my analyses. Looking forward to seeing what I find.
Glad it worked!
Finally added this to the FAQ: https://elbersb.github.io/segregation/articles/faq.html#how-can-i-compute-indices-for-different-areas-at-once-
Let me know what else would be useful to put there!
Hi Ben,
Thanks so much for putting
segregation
together, it's a great package, very well documented, and tidyverse friendly. I'm hoping to use it to create measures of M and H to use in my dissertation analyzes. I have what I'm sure is a fairly simple coding problem (I'm a recent R convert from Stata so my question might be due a lack of sophisticated R skills).I have data on every census tract in the country for 2000, 2009-2013, and 2015-2019 for five race-ethnic group: non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, Hispanic, and non-Hispanic Other. Here is a dropbox link to sample data with all 2019 census tracts in NYC and Philly CBSAs.
I want to group the data by CBSA (and potentially by counties within CBSAs if possible) and calculate local segregation scores based on each tract's deviation from its respective CBSA distribution, instead of all tracts in the country. I've broken my data into one dataset for each year to reduce coding complexity. So, the following example data frame contains all census tracts in NYC and Philly CBSAs and takes the following format:
Running the following code produces local segregation scores for each tract:
However, I want local segregation scores that report each tract's deviation from its respective CBSA distribution, not all tracts in the country. Do you know how I could do this? I tried to simply
group_by
CBSA before I ran themutual_local
function but I seem to be getting the same scores regardless. My second thought would be to loop themutual_local
function over a vector of CBSA values but that just got me two sets of the same results for each CBSA.Here's my coding attempts at producing different scores by CBSA using
group_by
andmap_df
:If I compare all three data frames, I'm getting the same results for each, but I would think they would differ if grouped by CBSA.
I don't see a
within
function like formutual_total
but maybe I missed something? Any help would be greatly appreciated!Thanks, Kasey.