The purpose of this repository is to provide a simple and reproducible R pipeline to investigate residential segregation (RS) using US census data. The pipeline contains two components:
*: The implementation of the three scores in this pipeline is neither endorsement nor an emphasize. Researchers should study all residential segregation indices to choose the ones that match with their study objectives.
We hope this work would relieve social scientists from repetitive data pulling, and inspire to adapt the reproducible pipeline.
Note: If you don’t have the time to go through the whole documentation, please finish reading the Remarks Section before modifying the code
targets
and renv
if
you don’t already haverenv::restore()
to install
the required R packages. Please give permission to install the
necessary packages. This will mirror the version of packages used in
the creation of the work exactly._targets.R
fileTODO:
, which can
be enlisted via a global
search,
i.e. cmd/control + shift + f
.
P003003
means
Total!!Population of one race!!White alone
in 2000
data,
and means Total!!Black or African American alone
2010
datatargets::tar_make()
in the console to run the pipeline.In this section, we provide two examples for calculating RS indices of
one state (stored in master
branch)
or multiple states (stored in meds_desert
branch)
respectively.
In this example, we provide the pipeline to calculate the indices for a single state. As an bonus, a section of code that plots a index to the map are supplied, as shown in Figure 1*. Figure 1 includes two maps of the 2010 Alabama County Level dissimilarity index, White (majority) with respect to Black (minority), calculated using different definitions of lower level geographic unit, i.e. census tract level and block level.
*: The example pipelie only produces one of the plots, where caption had been manually modified.
Figure 1: 2010 Alabama Dissimilarity Index at county level calculated with census tract level statistics (a) and block level statistics (b)
Note: For those who are interested in 2020 AL Indices, please refer to test case in Issue 7 as an example of configuration
The example demonstrate how to calculate residential segregation for
multiple states collectively, either via an intput file or via an inline
code. Please find the pipeline via
meds_desert
branch.
In this section, we discuss our observations when creating the indices, which includes thoughts on numeric calculation with census data, interpretation, and practice of data sharing.
During the calculation, we observe that depending on how the areal
units are defined, it is possible to have areas with no majority or
minority population at all, i.e. nmajority = 0 or
nminority = 0. This complicates the calculation of the
scores, for example, introducing infinite or NaN
as a score.
Without finding any remedies in the literature yet, we defined these
indices as missing values collectively.
It is very important to confirm if your variable codes match with your anticipated variable with the census data base. Even though we build error prevention mechanism in the code to numerically verifies, we are not certain it will catch the error 100% particularly with the flexibility that allow users’ customization. We provided how code changes in different years, see Get started.
In the calculation, we do not assume that the minority numerically complements the majority, i.e the numbers of minority and majority sums to the total. These indices would be different from the indices calculated with the complementing assumption.
The indices are claculated using their definition, which means they are possibly directionally different in their interpretation, e.g. interaction and isolation indices. The user will have to define their own reverse coding function to yield directionally consistent interpretation.
The interpretation of residential segregation indices gets complicated quickly depending on their areal definition. Hence, we don’t offer too much suggestions. We highly recommended the user to carefully go through Massey and Denton (1988) for more details when planning which indices and which areal unit to use in calculation. For people who seeks real-world example, we defer to Iceland and Weinberg (2002).
The followings are a few questions we had when calculating the residential segregation across the US, instead of specific metropolitan areas in previous studies.
The residential segregation indices are caluclated by agregating statistics of smaller areal units, where the lower arear units can be defined differently. For example, when calculating the residentital segregation indices at the county level, it is possible to define the smaller areal unit be census tract or block. Different definitions can yield inconsistent scores, both the magnitude of scores and the ranking of the scores. How to interpret the inconsistency due to the different definitions of lower area units remains as a question to us.
During calculating RS across multiple states, we observed there are few majority and minority with in an areal unit when we don’t assume majority and minority sum to total. For example, if we defined the White as majority and the Black as minority, there are few majority and minority in a areal unit that are in Indian reservation. Are the indices still well-defined in this case?
With improved efforts to collect more diverse racial information, it
is possible to have individuals who have more than one racial
background. For example, in 2010 census
data, we
have a variable code P003008
for Total!!Two or More Races
. How
to utilize this informaiton in calculating racial RS can lead to a
fruitful discussion.
We prefer questions or bug reports via Issues tab of the repository, such that the answer to your question can serve a broader audience. We are also open to questions via Email if you don’t feel comfortable with the aforementioned approach.
If you would like to contribute to this tutorial, we are welcome any contribution via pull requests so that you get proper credit.