Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
29.13k stars 18.42k forks source link

India update time & stale data #3135

Open CSSEGISandData opened 4 years ago

CSSEGISandData commented 4 years ago

Hello all,

Several issues have been submitted this week regarding India data going stale and failing to update within a 24 hour period, so we're going to pin this issue to explain the cause of the issue.

Our data is sourced from India's Ministry of Health and Family Welfare (linked here). As detailed in our README file in the csse_covid_19_data folder, our daily products used to be produced between 3:30 and 4:00 UTC. In early June, the update time for India shifted back slightly, so on June 15 we adjusted our production time to be between 04:45 and 05:15 GMT to catch their early morning update. In that sense, our data has historically captured the data released by India for the next day then when our daily products are generated for midnight on the East Coast of the United States (when we capture a global 'snapshot' of the epidemiological data around the world - the date in EST is the header for the time series files and daily reports). Of note, this is the reason our data has always appeared to be one day "faster" than on Wikipedia.

In the last week, India has become more erratic with their reporting and on multiple days their early morning update has not been released before our daily products are generated. This occurrence results in no new data being released by the country within the 24 hour period between our product generation, and as such the data appears to have gone stale. To be clear, this is related to the source not updating - it is not an issue with our infrastructure. In the short term, we will be manually updating the time series files to capture the early morning update such that our data is consistent. We are also investigating alternative methods to see how we can ensure these cases are still included moving forward.

Thank you for your patience.

herr-barus commented 3 years ago

Hello CSSEGISandData, data for India in csse_covid_19_data\csse_covid_19_daily_reports\09-20-2020.csv are still wrong (line 236-271). kind regards Barus