ec-jrc / COVID-19

Other
43 stars 23 forks source link

CumulativeDeceased for Netherlands issue #21

Closed resurepus closed 4 years ago

resurepus commented 4 years ago

This happened first for July 18th. From jrc-covid-19-countries-20200717.csv and jrc-covid-19-countries-20200718.csv:

2020-07-17 | NLD | Netherlands | 52.133057 | 5.29525 | 51454 | 6138 2020-07-18 | NLD | Netherlands | 52.133057 | 5.29525 | 51581 | 12274

This continues for subsequent daily files. It is not present in jrc-covid-19-all-days-by-country.csv:

2020-08-14,NLD,Netherlands,52.133057,5.29525,61840,6167

The artifact seems to come from a "NOT SPECIFIED" regional entry that was there on July 17th, but not on later days. After that regional data was added, but the NOT SPECIFIED count was not reset. Here from jrc-covid-19-regions-latest.csv:

2020-07-17 | NLD | Netherlands | NOT SPECIFIED |   |   | 0 | 6138

Similarly for Iceland, from July 2nd CumulativePositive is higher in the 'single date' and 'latest' file than in the 'all-days' file.

JumbaJookiba commented 4 years ago

The NOT SPECIFIED entry is used when no specific data is available. When specific data were made available, the last data known for NOT SPECIFIED was still used in the sums. Adding a 0 record for NOT SPECIFIED at July, 18th solved the problem. If necessary, we can recreate all the files.

resurepus commented 4 years ago

Thanks for taking care of this. Now I use just the 'latest'-files, so it is not necessary to recreate the files. Will open a separate issue for the Iceland variant of this.