Yu-Group / covid19-severity-prediction

Extensive and accessible COVID-19 data + forecasting for counties and hospitals. 📈
https://arxiv.org/abs/2005.07882
MIT License
228 stars 92 forks source link

NYTimes has "City1" and "City2" as countyFIPS codes #3

Closed SteveGoldstein closed 4 years ago

SteveGoldstein commented 4 years ago

The last 2 lines of the processed nytimes_infections file begin with "City1" and "City2" in the first field. I believe City1 corresponds to the "New York City" line in the raw file (with no fips code) and City2 to Kansas City,Missouri (also no fips code).

$ tail -2 nytimes_infections.csv |less -SX
City1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, ...
City2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,  ...

https://github.com/Yu-Group/covid19-severity-prediction/blob/master/data/county_level/processed/nytimes_infections/nytimes_infections.csv

shifwang commented 4 years ago

Yeah, since there is no countyFIPS code for these two cities, we use two made-up countyFIPS codes.

Makosak commented 4 years ago

Qinyun in our group also highlighted this -- I think a challenge with some of these is that they capture cities differently than counties _sometimes,__ though the goal is still to track counties. We can highlight this as one of the challenge in multiple COVID datasets.

SteveGoldstein commented 4 years ago

If the goal is to track counties, then there are some issues to sort out about entries in both usafacts and nytimes raw files with "city" or "City." For example, Baltimore county (fips 24005) and Baltimore City (fips 24510) are both in both files.