CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.11k stars 18.39k forks source link

US states no longer available in time series data??? #1578

Open sean-mcclure opened 4 years ago

sean-mcclure commented 4 years ago

With the latest change to format, the new time series files no longer list the US states. What happened to the US state time series data?

drcanak commented 4 years ago

There are endless posts in the "Issues" regarding changes to the data, and how those changes are breaking all manner of analyses people were doing that relied on this data.

To your question, the State level data you are hoping to find (as I am), is no longer updated and available as it once was. The best right now are the daily reports. The trouble there is that the CSV files change from one day to the next, with regard to what columns are there and in what order those columns appear. So it's not as easy as pulling them all down, and simply merging them together.

Just go through the "Issues" to see what a hot mess this became for people who have come to rely on this data. I don't get it. JH is full of super smart people. How they made such a mess of this, so quickly, is anyone's guess.

If anyone has any R code that can pull these daily update files down, clean them, and get them merged into one big, flat file, I sure would be appreciative. The time series at the State level is very helpful for modeling/planning purposes. My R skills are just not that good.

mdibenigno commented 4 years ago

Couldn't agree more @drcanak !!! I've been patiently waiting so I can refresh my R scripts with state-level updates.. but not cool that it has been so long. I did find this data source which could be a good substitute but am hesitant to have to recode everything... https://covidtracking.com/

rks125 commented 4 years ago

I’ve already posted temporary csv files that give you what you had before. Use if you like. Included power query if you want to update yourself. I will update daily until JH resolves issue.

https://www.soothsawyer.com/john-hopkins-time-series-data-confirmed-case-csv-after-march-22-2020/?github=4

drcanak commented 4 years ago

Yes,

Thank you @rsk125. I did see a post you made elsewhere, grabbed your files, fumbled around a bit not knowing anything about PowerQuery (oh, you have to install it ;-) ), and was able to get sheets reflective of the updates. Thank you for this!

bfosten commented 4 years ago

In case it's helpful for folks encountering this issue, we've set up a new repo, CovidAPI, which provides time series data for states and provinces (see Regions), as well as other formats of the Johns Hopkins data.

sean-mcclure commented 4 years ago

Thanks @bfosten, the coviddata/covid-api is indeed much better.

covid19.js now bases its data source from this API as it appears much more reliable. Please don't change the formatting :)

sean-mcclure commented 4 years ago

And then he goes and changes the URLs, and apparently something with the formatting since now the data cannot be fetched. I give up. It appears it is too much for anyone to leave their original data formatting/destination alone. Data sources that change like this are useless.

bfosten commented 4 years ago

@sean-mcclure Apologies for this. Having "API" in the repo name was confusing people who just wanted CSVs, so it has been renamed from covid-api to coviddata (which is the same formatting as the GitHub organization name). The format shouldn't have changed; I just tried using covid19.js using the URLs, and the README examples I tried seem to be working, AFAICT. I've opened a PR with the changes, in case it's useful.

kedionai commented 4 years ago

No worries. I had to change the links for the CSV fetches but once I did that everything worked again. Thanks for not changing the formatting.