GeoDaCenter / opioid-policy-scan

The Opioid Environment Policy Scan provides access to data at multiple spatial scales to help characterize the multi-dimensional risk environment impacting opioid use in justice populations across the United States.
13 stars 14 forks source link

Standardize shapefiles #68

Closed mradamcox closed 11 months ago

mradamcox commented 12 months ago

A little overhaul of the spatial data here would be good. Some tasks include:

mradamcox commented 11 months ago

After a good bit of research on different ids via the Census Bureau website (and help from folks on their Slack workspace) I've decided to create a new hybrid identifier for our purposes here, HEROP_ID. This will allow us to tack a new field to the CSVs and Shapefiles, without changing the meaning of any existing columns, like GEOID, and we can retain all existing columns as well for backward compatibility. This field will streamline the join process, and provide a single structure for all geographic levels.

Here's an example for each level (modified table excerpt from Understanding Geographic Identifiers (GEOIDs)):

Area Type GEOID Structure Number of Digits Example Geographic Area Example HEROP_ID
State STATE 2 Texas 040US48
County STATE+COUNTY 2+3=5 Harris County, TX 050US48201
Census Tract STATE+COUNTY+TRACT 2+3+6=11 Census Tract 2231 in Harris County, TX 140US48201223100
ZCTA ZCTA 5 Suitland, MD ZCTA 860US20746

The HEROP_ID format is similar to, but simpler than, the GEOID format from data.census.gov that is described at the bottom of Understanding Geographic Identifiers (GEOIDs). While the latter has four internal digits: 2 for Geographic Variant and 2 for Geographic Component, we don't record that information in our geometries now (I'm not even sure they are relevant to the geographic areas we are working with anyway), so these four digits are eliminated. That leaves us with the following format:

Summary Level Code + "US" + GEOID

Where Summary Level Code is a 3-digit number, and the "US" in the middle will force the value to text in any spreadsheet software, as the "G" prefix has done in the past.

mradamcox commented 11 months ago

One update to the HEROP_ID that will be made is the addition of a suffix for the year, which is necessary now that we have multiple years of geographies. If a row is meant to join to a 2018 county geography, for example, its id will now look be composed as Summary Level Code + "US" + State FP + County FP + "-" + Geography Year: 050US01001-2018