UKHSA-Internal / coronavirus-dashboard

Dashboard for tracking Coronavirus (COVID-19) across the UK
https://coronavirus.data.gov.uk
MIT License
249 stars 43 forks source link

Numbers for "Cases by date reported" and "Cases by date reported, by nation" are inconsistent #247

Closed annapowellsmith closed 4 years ago

annapowellsmith commented 4 years ago

What I expected to see

I expected the "Cases by date reported" chart of cases across the UK to show the same total numbers as the "Cases by date reported, by nation" chart, since the second is presumably just a more granular version of the first.

If the two differed, I expected a clear explanation as to why, and the process used to record each set of numbers.

What I actually saw

The UK totals do not match the sum of cases by nation . For example, on 6 May 2020, the new cases reported for the UK were 6,111, but the sum of new cases reported by each nation was 1611+53+272+95=2031.

Also, the chart of cumulative "Cases by date reported" for the UK looks like this:

export

But the chart of cumulative "Case by date reported, by nation" for each devolved nation looks like this:

export (1)

Clearly, there is some major difference in recording behaviour.

I tried to understand this by looking at the chart notes and documentation. These did not explain the difference.

The UK-wide "Cases by date reported" chart has an accompanying note:

On 2 July, case data from pillars 1 and 2 of the testing programme were combined and de-duplicated, resulting in a step decrease in the cumulative number of cases reported.

And the more granular "Cases by date reported, by nation" chart has an accompanying note:

Initially only pillar 1 tests (NHS and, in England, PHE labs) were included but pillar 2 (commercial tests) have been included from varying dates between 15th June and 14th July for each nation, leading to step increases in the numbers of cases reported at different times.

Firstly, it is unclear whether the first note applies also to the the second chart, and vice versa.

Secondly, it is still unclear why the numbers in the two charts do not match. The about page does not clarify the issue.

Requested solution

Please explain clearly exactly which positive test results have been recorded on each chart, at each date, and why the numbers are different.

This will be most helpfully done by documenting the qualifying criteria used to create each set of numbers, at each date, in each case.

xenatisch commented 4 years ago

Thanks for reaching out.

I'm afraid this issue falls outside of the dev team's domain. I will share it with our data team, but I suggest you use the email on the website to get in touch too.

FYI - @statsgeekclare @PHEgeorginaanderson

annapowellsmith commented 4 years ago

Thanks. I sent a link to this GitHub issue to coronavirus-tracker@phe.gov.uk as suggested.

This is the reply I got by email:

Please see also ‘About the data’. In the chart ‘by nation’, the additional pillar 2 data that could retrospectively not be assigned to an exact date has been entered at the mid-point of the related period (15th June=14th July). At the same time a deduplication (to count people once even if there are multiple tests) taking effect on 2nd July reduced the number of cases and shows as this decrease in the chart Cases by date reported. I appreciate that this may in places not present itself in a completely consistent way but I’m sure you’ll understand that the co-ordination of four nations’ data where definitions and methods are still evolving and not always perfectly matching is a challenge that required some adjustments in the process.

I'm not sure I fully understand this, but sharing anyway in case it helps anyone.