UI-Research / mobility-from-poverty

https://ui-research.github.io/mobility-from-poverty/
5 stars 1 forks source link

Set up geographic harmonization for next update #374

Open awunderground opened 1 month ago

awunderground commented 1 month ago

We need to deal with at least two types of geographic harmonization:

  1. How geographies relate from year-to-year. For example, the Census Bureau switched from reporting data for eight counties in Connecticut to nine planning regions in Connecticut for 2022 data.
  2. How geographies where data are reported (e.g. PUMAs and ZCTAs) relate to to our target geographies (counties and populous census places).

We need to develop a comprehensive plan for addressing both of these questions and create “target” files and crosswalk files that all developers will use to consistently create metrics over times.

First, we hope to create a Quarto document that articulates a plan to answer big questions:

This document should include the number of geographies in each year, justifications for decisions, and documentations of important changes to geographies.

Second, we need to update the repo to contain standardized target files and crosswalks that are described in the step 1.

malcalakovalski commented 2 days ago

I noticed a couple of files in the geographic-crosswalks/data folder are unused. For example, none of the files starting with long-xwalk are used by any script. It looks like these were added in April 2023 but with no accompanying script. Likewise, the names-to-fips files aren't called in any scripts.

Should these be deleted from the repo to avoid confusion?

malcalakovalski commented 2 days ago

I wonder if it would be beneficial to standardize the name of the crosswalk files to be something like {source-geography}-{target-geography}-crosswalk_{year}.

To prevent breaking existing code that relies on crosswalks with old file names, we could simply start by encouraging adopting the new file name convention in the documentation for now and edit the scripts at a later date (maybe next year after the metrics are updated?)

malcalakovalski commented 19 hours ago

@awunderground I also noticed that the process of downloading the tract to place 2020 crosswalk from geocorr is documented in poverty-exposure-2021.qmd and in race-ethnicity-exposure-city-2021.qmd. This strikes me as duplicative and hard to find!

Should I create a new .qmd in the geographic-crosswalks folder with this information? I think it should also contain the code in create_crosswalk_file.R since that script creates tract-to-county_2018.csv and tract-to-county_2020.csv