RamiKrispin / coronavirus

The coronavirus dataset
https://ramikrispin.github.io/coronavirus/
Other
498 stars 209 forks source link

The number of confirmed cases in Spain on 2020-04-24 is a negative number #52

Closed savethecathode closed 4 years ago

savethecathode commented 4 years ago

In the CRAN version of the coronavirus package the number of confirmed cases in Spain on 2020-04-24 is -10034, which seems like a mistake.

To visualize the number of confirmed cases in Spain I used the following coronavirus %>% filter(country=="Spain" & type=="confirmed") %>% ggplot(aes(date, cases)) + geom_line()

To identify the specific data point I used the following coronavirus[which.min(coronavirus$cases),]

RamiKrispin commented 4 years ago

Hi @savethecathode

As I am using the diff of the cumulative values on the raw data, negative values will occur on the data whenever updates of the data (e.g., removing false positive, misclassification, errors, etc.) are not retroactively (e.g., removed from the day it was added), as in most of the cases the data is anonymized. Therefore, you will see a drop on the cumulative values on the raw data.

In the case of Spain as can see on the raw data the cumulative values of the confirmed cases:

This issue in the data related to the use of different sources of data and John Hopkins are trying to fix it. More information available on the following issue