Closed resurepus closed 4 years ago
The NOT SPECIFIED entry is used when no specific data is available. When specific data were made available, the last data known for NOT SPECIFIED was still used in the sums. Adding a 0 record for NOT SPECIFIED at July, 18th solved the problem. If necessary, we can recreate all the files.
Thanks for taking care of this. Now I use just the 'latest'-files, so it is not necessary to recreate the files. Will open a separate issue for the Iceland variant of this.
This happened first for July 18th. From jrc-covid-19-countries-20200717.csv and jrc-covid-19-countries-20200718.csv:
2020-07-17 | NLD | Netherlands | 52.133057 | 5.29525 | 51454 | 6138 2020-07-18 | NLD | Netherlands | 52.133057 | 5.29525 | 51581 | 12274
This continues for subsequent daily files. It is not present in jrc-covid-19-all-days-by-country.csv:
2020-08-14,NLD,Netherlands,52.133057,5.29525,61840,6167
The artifact seems to come from a "NOT SPECIFIED" regional entry that was there on July 17th, but not on later days. After that regional data was added, but the NOT SPECIFIED count was not reset. Here from jrc-covid-19-regions-latest.csv:
2020-07-17 | NLD | Netherlands | NOT SPECIFIED | | | 0 | 6138
Similarly for Iceland, from July 2nd CumulativePositive is higher in the 'single date' and 'latest' file than in the 'all-days' file.