Closed tendersoft-mjb closed 4 years ago
There have been a few complaints about this field. We have removed it from all of the time series data, due to its misleading and possibly inaccurate values. Sorry for the confusion.
Guys, you just inserted a new sheet "Announcement" to the google spreadsheet. That effectively breaks any algorithm trying to parse the datetime information from such a sheet name. PLEASE: If you want to convey some new information via the spreadsheet, then do place it in some unused location i.e. previously empty column! Thank you.
@CSSEGISandData IMHO removing the column completely is the worst possible solution, because
For example, with the date of patient 0 in each area we could track the growth rate of local epicemic - see below: Note: China is on the right Y-axis, all other on the left Y-axis.
Sure, there are errors in the data like date of first reported case is later than the earliest date from daily updates, hence the -1 on X-axis. Still, it's very useful to know if local epidemics are progressing slower or faster than the main one.
The 'First confirmed date in province/country' column is useful especially for Chinese data, where we do not have the daily updates for at least 18 days after patient 0 was diagnosed.
I encourage you to add this column back. However, to keep the shape of the data from changing again you could add another spreadsheet/CSV with just the following data:
Province/State | Country/Region | First confirmed date in province/country | First confirmed date in country | Lat | Long |
---|---|---|---|---|---|
Anhui | Mainland China | 22.01.2020 | 03.01.2020 | 31,82571 | 117,2264 |
Beijing | Mainland China | 21.01.2020 | 03.01.2020 | 40,18238 | 116,4142 |
Zhejiang | Mainland China | 21.01.2020 | 03.01.2020 | 29,18251 | 120,0985 |
Thailand | 21.01.2020 | 21.01.2020 | 13,7563 | 100,5018 | |
Japan | 21.01.2020 | 21.01.2020 | 35,6762 | 139,6503 |
This worksheet would be mostly static. Changing only if and when a new area gets infected. All other worksheets/CSVs could be updated almost independently from this one.
A column in the time_series_2019-ncov-Confirmed.csv used to be named 'First confirmed date in country (est.)' but now is 'First confirmed date in country' - this small change brakes all the downstream analytics.
Besides, the column name is misleading since it contains dates of first confirmed cases in either state/province or in country - depending on which is the smaller administrative unit.
There could be 2 columns:
The 1st one showing data from former/current column 'First confirmed date in country (est.)'/'First confirmed date in country'.
The 2nd one showing actual first date for the country as a whole. The column is not strictly necessary since people who need it, will add it on their side, but it would preserve backward compatibility with existing analytical solutions.
Either way kindly please keep the names and data consistent because it causes errors and confusion in the analytic pipeline down the line.