cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

Split ZIP->county mapping from county->{MSA, HRR} mapping #117

Open capnrefsmmat opened 4 years ago

capnrefsmmat commented 4 years ago

We currently map counties to MSAs or HRRs by using our file that maps ZIPs to counties. However, some counties are not the primary county for any ZIP; for sources like USAFacts, this prevents us from correctly ingesting and reporting those counties, and means that mapping them to MSAs and HRRs is more complicated than necessary.

If we can find an authoritative source for MSA and HRR counties, we should create a separate mapping file that we know contains all counties, and then keep the current ZIP->county mapping solely for dealing with data reported at the ZIP level.

@jsharpna, this may be another task to add to your geographic aggregation unification.

jingjtang commented 4 years ago

The mapping files that we use currently are described here https://github.com/cmu-delphi/covid-19/tree/main/geographical_scope. The latest version that we could find for HRR mapping is in 2017 version while the MSA mapping file is in 2019 version.

To be more detailed about the issue for JHU and USA Facts, you can look at the "Counties not in our canonical dataset" section in DETAILS.md for both JHU and USA Facts.

Our mapping between counties and MSAs/HRRs are based on ZIPs and the FIPS list in the mapping file is different from the FIPS list in the reported files. For example, 28039 is in JHU and USAFacts ’s dataset, but it is not in our mapping files. 28039 should be divided into “28059”, “28041", “28131”, “28045", “28059”, “28109", “28047” so that we can map it into corresponding MSAs/HRRs

@jsharpna

jsharpna commented 4 years ago

Using the 2020 version of MSA list from Census Bureau results in large changes from 2018 version. Following notes are from README in data_proc/...