CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.1k stars 18.38k forks source link

Duplicate values in the time series? #398

Closed Vincent-Stragier closed 4 years ago

Vincent-Stragier commented 4 years ago

I've created a script to generate graphs of the situation but I think that some regions/countries are duplicated in https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series.

Like for 44 and 45, 50 and 51, 64 and 65, 85, 126 (see below).

Process (1/126): Afghanistan
Process (2/126): Albania
Process (3/126): Algeria
Process (4/126): Andorra
Process (5/126): Argentina
Process (6/126): Armenia
Process (7/126): Australia
Process (8/126): Austria
Process (9/126): Azerbaijan
Process (10/126): Bahrain
Process (11/126): Bangladesh
Process (12/126): Belarus
Process (13/126): Belgium
Process (14/126): Bhutan
Process (15/126): Bosnia and Herzegovina
Process (16/126): Brazil
Process (17/126): Brunei
Process (18/126): Bulgaria
Process (19/126): Burkina Faso
Process (20/126): Cambodia
Process (21/126): Cameroon
Process (22/126): Canada
Process (23/126): Channel Islands
Process (24/126): Chile
Process (25/126): Colombia
Process (26/126): Costa Rica
Process (27/126): Croatia
Process (28/126): Cyprus
Process (29/126): Czech Republic
Process (30/126): Denmark
Process (31/126): Dominican Republic
Process (32/126): Ecuador
Process (33/126): Egypt
Process (34/126): Estonia
Process (35/126): Faroe Islands
Process (36/126): Finland
Process (37/126): France
Process (38/126): French Guiana
Process (39/126): Georgia
Process (40/126): Germany
Process (41/126): Gibraltar
Process (42/126): Greece
Process (43/126): Holy See
Process (44/126): Hong Kong
Process (45/126): Hong Kong SAR
Process (46/126): Hungary
Process (47/126): Iceland
Process (48/126): India
Process (49/126): Indonesia
Process (50/126): Iran
Process (51/126): Iran (Islamic Republic of)
Process (52/126): Iraq
Process (53/126): Ireland
Process (54/126): Israel
Process (55/126): Italy
Process (56/126): Japan
Process (57/126): Jordan
Process (58/126): Kuwait
Process (59/126): Latvia
Process (60/126): Lebanon
Process (61/126): Liechtenstein
Process (62/126): Lithuania
Process (63/126): Luxembourg
Process (64/126): Macao SAR
Process (65/126): Macau
Process (66/126): Mainland China
Process (67/126): Malaysia
Process (68/126): Maldives
Process (69/126): Malta
Process (70/126): Martinique
Process (71/126): Mexico
Process (72/126): Moldova
Process (73/126): Monaco
Process (74/126): Mongolia
Process (75/126): Morocco
Process (76/126): Nepal
Process (77/126): Netherlands
Process (78/126): New Zealand
Process (79/126): Nigeria
Process (80/126): North Macedonia
Process (81/126): Norway
Process (82/126): Oman
Process (83/126): Others
Process (84/126): Pakistan
Process (85/126): Palestine
Process (86/126): Panama
Process (87/126): Paraguay
Process (88/126): Peru
Process (89/126): Philippines
Process (90/126): Poland
Process (91/126): Portugal
Process (92/126): Qatar
Process (93/126): Republic of Korea
Process (94/126): Republic of Moldova
Process (95/126): Romania
Process (96/126): Russia
Process (97/126): Russian Federation
Process (98/126): Saint Barthelemy
Process (99/126): Saint Martin
Process (100/126): San Marino
Process (101/126): Saudi Arabia
Process (102/126): Senegal
Process (103/126): Serbia
Process (104/126): Singapore
Process (105/126): Slovakia
Process (106/126): Slovenia
Process (107/126): South Africa
Process (108/126): South Korea
Process (109/126): Spain
Process (110/126): Sri Lanka
Process (111/126): St. Martin
Process (112/126): Sweden
Process (113/126): Switzerland
Process (114/126): Taipei and environs
Process (115/126): Taiwan
Process (116/126): Thailand
Process (117/126): Togo
Process (118/126): Tunisia
Process (119/126): UK
Process (120/126): US
Process (121/126): Ukraine
Process (122/126): United Arab Emirates
Process (123/126): Vatican City
Process (124/126): Viet Nam
Process (125/126): Vietnam
Process (126/126): occupied Palestinian territory
pixelscript commented 4 years ago

Yes also some rows are missing data as it continues under the duplicated name. This means the data requires some manual clean up before it can be used easily.

yy commented 4 years ago

I've made this visualization: http://yyahn.com/covid19/ and published my workflow: https://github.com/yy/covid19-data

This workflow converts this dataset into a tidy (long) format and then merge with Worldbank statistics. Feel free to use any parts of it!

pixelscript commented 4 years ago

I'm going to be doing something similar for: https://github.com/pixelscript/covid-19-map

I don't know why they didn't just rename the existing rows instead of having duplicates.

klahoda commented 4 years ago

I've noticed that States were added to US data on Time Series today. This effectively duplicates data from earlier. Seeing the jump in numbers after adding today's data initially freaked me out until I realized the issue. I wonder if there might be a way to consolidate the data, for example, a simple country line with all data by date would be much appreciated. Kind of like what you have for France - just the data for the entire US in one row. Otherwise I'll need to massage data at each update instead of using this file as is.

yy commented 4 years ago

@klahoda Yes, separating country-level statistics and sub-country numbers would be great!

sibblegp commented 4 years ago

400

cipriancraciun commented 4 years ago

Have these issues been solved by the latest releases? (If so, please close this ticket in order to help the JHU team and keep things tidy.)