disease-sh / API

API for Current cases and more stuff about COVID-19 and Influenza
https://disease.sh
GNU General Public License v3.0
2.46k stars 639 forks source link

[BUG] Discrepancy with UK COVID19 Data #804

Closed 1CM69 closed 4 years ago

1CM69 commented 4 years ago

Firstly, thanks for putting the time in to this API but I seem to getting some discrepancies between the data returned via the API v3 & the actual data which this API is scraping from the UK Government website.

Looking at the historic data the API returns for the cumulative Deaths, is this:

  "7/20/20": 45397,
  "7/21/20": 45507,
  "7/22/20": 45586,
  "7/23/20": 45639,
  "7/24/20": 45762,
  "7/25/20": 45823,
  "7/26/20": 45837,
  "7/27/20": 45844,
  "7/28/20": 45963

Now below is an excerpt from the UK Government website: https://api.coronavirus-staging.data.gov.uk/v1/data?filters=areaName=United%2520Kingdom;areaType=overview&structure=%7B%22areaType%22:%22areaType%22,%22areaName%22:%22areaName%22,%22areaCode%22:%22areaCode%22,%22date%22:%22date%22,%22newDeathsByPublishDate%22:%22newDeathsByPublishDate%22,%22cumDeathsByPublishDate%22:%22cumDeathsByPublishDate%22%7D&format=json

the same site that is scrapped to form the API, returns the following for the same time period:

{
   "length":146,
   "maxPageLimit":1000,
   "data":[
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-29",
         "newDeathsByPublishDate":83,
         "cumDeathsByPublishDate":45961
      },
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-28",
         "newDeathsByPublishDate":119,
         "cumDeathsByPublishDate":45878
      },
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-27",
         "newDeathsByPublishDate":7,
         "cumDeathsByPublishDate":45759
      },
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-26",
         "newDeathsByPublishDate":14,
         "cumDeathsByPublishDate":45752
      },
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-25",
         "newDeathsByPublishDate":61,
         "cumDeathsByPublishDate":45738
      },
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-24",
         "newDeathsByPublishDate":123,
         "cumDeathsByPublishDate":45677
      },
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-23",
         "newDeathsByPublishDate":53,
         "cumDeathsByPublishDate":45554
      },
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-22",
         "newDeathsByPublishDate":79,
         "cumDeathsByPublishDate":45501
      },
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-21",
         "newDeathsByPublishDate":110,
         "cumDeathsByPublishDate":45422
      },
      {
         "areaType":"overview",
         "areaName":"United Kingdom",
         "areaCode":"K02000001",
         "date":"2020-07-20",
         "newDeathsByPublishDate":11,
         "cumDeathsByPublishDate":45312
      }
   ]
}

Straight away the obvious error is that the current total of deaths 2020-07-29 is lower from the UK Government site than what the API displays for the previous day 2020-07-28.

This is doubly strange as when using the URL: https://disease.sh/v3/covid-19/countries/UK

shows the total number of deaths for today 2020-07-29 as 45961, see below:

{
  "updated": 1596045105410,
  "country": "UK",
  "countryInfo": {
    "_id": 826,
    "iso2": "GB",
    "iso3": "GBR",
    "lat": 54,
    "long": -2,
    "flag": "https://disease.sh/assets/img/flags/gb.png"
  },
  "cases": 301455,
  "todayCases": 763,
  "deaths": 45961,
  "todayDeaths": 83,

and this does indeed match the official UK figure.

I have checked the HISTORIC deaths totals returned by the API against the official totals for the last 10 days and the API has artificially increased each official total by 85.

I have not bothered going back any further to check.

pujux commented 4 years ago

I don't have much time so I just flew over this but I am quite sure this is because of the historical endpoints using the JHU as a datasource, not official government data, if you want official gov data, please check our /gov endpoints

1CM69 commented 4 years ago

Just checked through JHU data as you posted your reply & I can see that the historic data matches what JHU are putting out.

Thanks, I'll look in to the /gov endpoints. For the moment I have manually setup a subtraction of 85 from the historic records I collect but no doubt this will break at some point.

Thanks for replying.

pujux commented 4 years ago

I would really advise you to switch over to the /gov/uk endpoint! 👍