rsegregation
A shortcut package with formulas for several different indices of
segregation. rsegregation is designed to fit into the tidyverse
framework, particularly dplyr.
Installation
The development version from GitHub can be
installed with:
# install.packages("devtools")
devtools::install_github("arthurgailes/rsegregation")
Usage
rsegregation depends upon dplyr (>1.0.0), and can be used with it. To
return a single divergence score for Bay Area County:
Divergence and Entropy
Calculate the divergence score for the entire dataset
rsegregation can work with base r, or within several dplyr verbs:
library(rsegregation)
library(dplyr)
## included dataset of Bay Area Census tracts
# Using dplyr
bay_divergence <- bay_race %>%
summarize(bay_divergence = divergence(white,black,asian, hispanic, all_other,
population=total_pop, summed = T))
# Using base r
bay_divergence <- divergence(bay_race[c('white','black','asian', 'hispanic', 'all_other')],
population=bay_race$total_pop, summed = T)
# or
bay_divergence <- divergence(bay_race$white,bay_race$black,bay_race$asian,
bay_race$hispanic, bay_race$all_other, population=bay_race$total_pop, summed = T)
# all return the same result:
bay_divergence
Calculate divergence by group
Using the included Bay Area dataset of 2010 racial groups, divergence
can be calculated by county using dplyr::group_by()
.
#library(dplyr)
group_by(bay_race, county) %>%
summarize(bay_divergence = divergence(white,black,asian, hispanic, all_other,
population=total_pop, summed = T))
| county | bay\_divergence |
|:---------------------------------------|----------------:|
| Alameda County, California, 2010 | 0.2450583 |
| Contra Costa County, California, 2010 | 0.2129913 |
| Marin County, California, 2010 | 0.1304815 |
| Napa County, California, 2010 | 0.1459522 |
| San Francisco County, California, 2010 | 0.2056087 |
| San Mateo County, California, 2010 | 0.2387524 |
| Santa Clara County, California, 2010 | 0.2093378 |
| Solano County, California, 2010 | 0.1333189 |
| Sonoma County, California, 2010 | 0.0756877 |
By-observation divergence scores
Divergence and entropy are both calculated rowwise by default (summed =
FALSE).
bay_entropy <- bay_race
bay_entropy$entropy <- entropy(bay_race[c('white','black','asian',
'hispanic','all_other')], population=bay_race$total_pop, summed = F)
head(bay_entropy)
| fips | total\_pop | hispanic | white | black | asian | all\_other | county | entropy |
|:------------|-----------:|----------:|----------:|----------:|----------:|-----------:|:---------------------------------|----------:|
| 06001400100 | 2937 | 0.0398366 | 0.7075247 | 0.0476677 | 0.1552605 | 0.0497106 | Alameda County, California, 2010 | 0.9566644 |
| 06001400200 | 1974 | 0.0764944 | 0.7831814 | 0.0157042 | 0.0739615 | 0.0506586 | Alameda County, California, 2010 | 0.7969746 |
| 06001400300 | 4865 | 0.0820144 | 0.6692703 | 0.1052415 | 0.0861254 | 0.0573484 | Alameda County, California, 2010 | 1.0859266 |
| 06001400400 | 3703 | 0.0896570 | 0.6546044 | 0.1209830 | 0.0729139 | 0.0618417 | Alameda County, California, 2010 | 1.1121719 |
| 06001400500 | 3517 | 0.0966733 | 0.5055445 | 0.2652829 | 0.0591413 | 0.0733580 | Alameda County, California, 2010 | 1.2816122 |
| 06001400600 | 1571 | 0.0802037 | 0.4271165 | 0.3914704 | 0.0509230 | 0.0502864 | Alameda County, California, 2010 | 1.2348325 |
Miscellaneous
Dataframes should be formatted as long on geographic observations
(e.g. tracts), but wide on group observations (e.g. races), as in the
included dataset of the San Francisco Bay Area.
head(bay_race)
| fips | total\_pop | hispanic | white | black | asian | all\_other | county |
|:------------|-----------:|----------:|----------:|----------:|----------:|-----------:|:---------------------------------|
| 06001400100 | 2937 | 0.0398366 | 0.7075247 | 0.0476677 | 0.1552605 | 0.0497106 | Alameda County, California, 2010 |
| 06001400200 | 1974 | 0.0764944 | 0.7831814 | 0.0157042 | 0.0739615 | 0.0506586 | Alameda County, California, 2010 |
| 06001400300 | 4865 | 0.0820144 | 0.6692703 | 0.1052415 | 0.0861254 | 0.0573484 | Alameda County, California, 2010 |
| 06001400400 | 3703 | 0.0896570 | 0.6546044 | 0.1209830 | 0.0729139 | 0.0618417 | Alameda County, California, 2010 |
| 06001400500 | 3517 | 0.0966733 | 0.5055445 | 0.2652829 | 0.0591413 | 0.0733580 | Alameda County, California, 2010 |
| 06001400600 | 1571 | 0.0802037 | 0.4271165 | 0.3914704 | 0.0509230 | 0.0502864 | Alameda County, California, 2010 |
Future development:
- decomposition of entropy index
- more measures of segregation
License
This package is free and open source software, licensed under GPL-3.