globaldothealth / monkeypox

Mpox 2022 repository
Other
175 stars 36 forks source link

Inconsistency in date of confirmation causes noisy/spiky UK 7d average plot #161

Closed corneliusroemer closed 2 years ago

corneliusroemer commented 2 years ago

England has released new case data twice weekly for a while now - always on Tuesday and Friday.

As their publishing schedule is quite regular, I was surprised to find the 7d average case rate plot for the UK to be so spiky:

image

When I investigated the date of confirmation for the English cases in your Google Sheet, I noticed what causes the spikes: you seem to be inconsistent on the weekdays these English cases get attributed.

Could you double check that you choose consistent weekdays for date of confirmation? That way Spikes in the 7d average plots would disappear.

Thanks a lot!

Here's the pivot table I used to spot the issue:

image

These are the official dates of the update, you should probably align with these dates:

image

See https://www.gov.uk/government/news/monkeypox-cases-confirmed-in-england-latest-updates#full-publication-update-history

You mostly seem to have chosen cutoff date used by UKHSA - but sometimes you used report day. It should be easy to fix - just filter to English cases and do some bulk editing. Thanks!

tvarrelman commented 2 years ago

Hi @corneliusroemer , thank you for finding this issue and providing a detailed report. I can confirm that the G.h cases derived from the 2022-07-19, 2022-07-15, and 2022-06-24 Epidemiological Overviews had a Date_confirmation of the publication date instead of the cutoff described by the UKHSA. I have resolved this issue.

As for the Date_confirmation of 2022-06-26, this was the cutoff date described by the UKHSA in the following report: https://www.gov.uk/government/publications/monkeypox-outbreak-epidemiological-overview/monkeypox-outbreak-epidemiological-overview-28-june-2022.

Thanks!

corneliusroemer commented 2 years ago

Excellent, thanks @tvarrelman!

There may be similar issues in other countries' data - would be great to check this systematically. You probably have a list of websites for each country that you check this in.