Open capnrefsmmat opened 4 years ago
The mapping files that we use currently are described here https://github.com/cmu-delphi/covid-19/tree/main/geographical_scope. The latest version that we could find for HRR mapping is in 2017 version while the MSA mapping file is in 2019 version.
To be more detailed about the issue for JHU and USA Facts, you can look at the "Counties not in our canonical dataset" section in DETAILS.md for both JHU and USA Facts.
Our mapping between counties and MSAs/HRRs are based on ZIPs and the FIPS list in the mapping file is different from the FIPS list in the reported files. For example, 28039 is in JHU and USAFacts ’s dataset, but it is not in our mapping files. 28039 should be divided into “28059”, “28041", “28131”, “28045", “28059”, “28109", “28047” so that we can map it into corresponding MSAs/HRRs
@jsharpna
Using the 2020 version of MSA list from Census Bureau results in large changes from 2018 version. Following notes are from README in data_proc/...
old version notes: 'msa_id' and 'msa_name' are added according to the msa_list.csv that Aaron found from https://apps.bea.gov/regional/docs/msalist.cfm (2019)
new version notes: 03_20_MSAs.xls : [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html
pulled the 2018 version from census bureau and compared against mas_list; there are 51 fips mappings in HI and VA that differ
We currently map counties to MSAs or HRRs by using our file that maps ZIPs to counties. However, some counties are not the primary county for any ZIP; for sources like USAFacts, this prevents us from correctly ingesting and reporting those counties, and means that mapping them to MSAs and HRRs is more complicated than necessary.
If we can find an authoritative source for MSA and HRR counties, we should create a separate mapping file that we know contains all counties, and then keep the current ZIP->county mapping solely for dealing with data reported at the ZIP level.
@jsharpna, this may be another task to add to your geographic aggregation unification.