Closed Vincent-Stragier closed 4 years ago
Yes also some rows are missing data as it continues under the duplicated name. This means the data requires some manual clean up before it can be used easily.
I've made this visualization: http://yyahn.com/covid19/ and published my workflow: https://github.com/yy/covid19-data
This workflow converts this dataset into a tidy (long) format and then merge with Worldbank statistics. Feel free to use any parts of it!
I'm going to be doing something similar for: https://github.com/pixelscript/covid-19-map
I don't know why they didn't just rename the existing rows instead of having duplicates.
I've noticed that States were added to US data on Time Series today. This effectively duplicates data from earlier. Seeing the jump in numbers after adding today's data initially freaked me out until I realized the issue. I wonder if there might be a way to consolidate the data, for example, a simple country line with all data by date would be much appreciated. Kind of like what you have for France - just the data for the entire US in one row. Otherwise I'll need to massage data at each update instead of using this file as is.
@klahoda Yes, separating country-level statistics and sub-country numbers would be great!
Have these issues been solved by the latest releases? (If so, please close this ticket in order to help the JHU team and keep things tidy.)
I've created a script to generate graphs of the situation but I think that some regions/countries are duplicated in https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series.
Like for 44 and 45, 50 and 51, 64 and 65, 85, 126 (see below).