cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

Add remaining US Territories to geocoding support for JHU #507

Open brookslogan opened 3 years ago

brookslogan commented 3 years ago

The covidcast package (and perhaps the API?) does not recognize AS, GU, MP, or VI, but these jurisdictions (as well as the Diamond Princess) are reported in the JHU data.

## These succeed:
covidcast::covidcast_signal(data_source="jhu-csse", signal="deaths_incidence_num", geo_type="state", geo_values="AL", start_day="2020-09-01", end_day="2020-09-15")
covidcast::covidcast_signal(data_source="jhu-csse", signal="deaths_incidence_num", geo_type="state", geo_values="PR", start_day="2020-09-01", end_day="2020-09-15")
## These fail:
covidcast::covidcast_signal(data_source="jhu-csse", signal="deaths_incidence_num", geo_type="state", geo_values="AS", start_day="2020-09-01", end_day="2020-09-15")
covidcast::covidcast_signal(data_source="jhu-csse", signal="deaths_incidence_num", geo_type="state", geo_values="GU", start_day="2020-09-01", end_day="2020-09-15")
covidcast::covidcast_signal(data_source="jhu-csse", signal="deaths_incidence_num", geo_type="state", geo_values="MP", start_day="2020-09-01", end_day="2020-09-15")
covidcast::covidcast_signal(data_source="jhu-csse", signal="deaths_incidence_num", geo_type="state", geo_values="VI", start_day="2020-09-01", end_day="2020-09-15")
## (and original source data also contains values for Diamond Princess)

The abbreviation to FIPS translation appears to be failing. It might also be the case that the data is not available via API calls at all; I have not tested this.

krivard commented 3 years ago

We probably don't have geocoding information for those regions, so they get filtered out. What's the priority on a fix?

brookslogan commented 3 years ago

My use case is an analysis I would like to start running in the next couple of days and have ready for a manuscript in a couple of weeks. I doubt this single use is enough to elevate it to a high priority though (so I am aiming to load this data via another route). The missing locations are the lowest population jurisdictions for this geo level, so I would expect this to be a lower priority averaged across all stakeholders.

dshemetov commented 3 years ago

Can confirm, we currently drop those geocodes. Issues to consider in fixing:

chinandrew commented 3 years ago

Looking into this a bit, it appears JHU only reports at the "state" FIPS level, no counties. However, county level FIPS codes do exist, e.g. here (bottom of page) with the Virgin Islands. They don't show up in the population file (FIPS_POPULATION_URL = "https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv" ), but they can be found online elsewhere (e.g. this census brochure)

krivard commented 3 years ago

Mmm, extracting data from a PDF is not super sustainable

chinandrew commented 3 years ago

Mmm, extracting data from a PDF is not super sustainable

Indeed. I'll keep looking around, I'm sure it's in CSV form somewhere. I wonder why it's not in that other census file though...

chinandrew commented 3 years ago

Apparently the Island Areas (American Samoa, Northern Mariana Islands, Guam, and U.S. Virgin Islands) don't go through the usual non decennial censuses, and even the decennial ones is at least nominally differentiate by name ("Island Area Census"). If we're just doing state level we may need to use the 2010 numbers.

chinandrew commented 3 years ago

Leaving this here for whenever this eventually gets worked on:

60,American Samoa,55519
60010,Eastern District,23030 
60020,Manu'a District,1143
60030,Rose Island,0
60040,Swains Island,17
60050,Western District,31329

66,Guam,159358
66010,Guam,159358

78,US Virgin Islands,106405
78010,Saint Croix Island,50601
78020,Saint John Island,4170
78030,Saint Thomas Island,51 634

69,Northern Mariana Islands, 53883
69085,Northern Islands Municipality,0 
69100,Rota Municipality,2527
69110,Saipan Municipality,48220
69120,Tinian Municipality,3136

population data from https://www.census.gov/data/tables/2010/dec/2010-island-areas.html and codes from https://en.wikipedia.org/wiki/List_of_United_States_FIPS_codes_by_county#cite_note-CensusT-15

Haven't found zip code populations, though there are "places" in that census link that may be translatable.