Open awunderground opened 1 month ago
I noticed a couple of files in the geographic-crosswalks/data
folder are unused. For example, none of the files starting with long-xwalk
are used by any script. It looks like these were added in April 2023 but with no accompanying script. Likewise, the names-to-fips
files aren't called in any scripts.
Should these be deleted from the repo to avoid confusion?
I wonder if it would be beneficial to standardize the name of the crosswalk files to be something like {source-geography}-{target-geography}-crosswalk_{year}
.
To prevent breaking existing code that relies on crosswalks with old file names, we could simply start by encouraging adopting the new file name convention in the documentation for now and edit the scripts at a later date (maybe next year after the metrics are updated?)
@awunderground I also noticed that the process of downloading the tract to place 2020 crosswalk from geocorr is documented in poverty-exposure-2021.qmd
and in race-ethnicity-exposure-city-2021.qmd
. This strikes me as duplicative and hard to find!
Should I create a new .qmd
in the geographic-crosswalks
folder with this information? I think it should also contain the code in create_crosswalk_file.R
since that script creates tract-to-county_2018.csv
and tract-to-county_2020.csv
We need to deal with at least two types of geographic harmonization:
We need to develop a comprehensive plan for addressing both of these questions and create “target” files and crosswalk files that all developers will use to consistently create metrics over times.
First, we hope to create a Quarto document that articulates a plan to answer big questions:
This document should include the number of geographies in each year, justifications for decisions, and documentations of important changes to geographies.
Second, we need to update the repo to contain standardized target files and crosswalks that are described in the step 1.