Closed savethecathode closed 4 years ago
Hi @savethecathode
As I am using the diff of the cumulative values on the raw data, negative values will occur on the data whenever updates of the data (e.g., removing false positive, misclassification, errors, etc.) are not retroactively (e.g., removed from the day it was added), as in most of the cases the data is anonymized. Therefore, you will see a drop on the cumulative values on the raw data.
In the case of Spain as can see on the raw data the cumulative values of the confirmed cases:
This issue in the data related to the use of different sources of data and John Hopkins are trying to fix it. More information available on the following issue
In the CRAN version of the coronavirus package the number of confirmed cases in Spain on 2020-04-24 is -10034, which seems like a mistake.
To visualize the number of confirmed cases in Spain I used the following
coronavirus %>% filter(country=="Spain" & type=="confirmed") %>% ggplot(aes(date, cases)) + geom_line()
To identify the specific data point I used the following
coronavirus[which.min(coronavirus$cases),]