globaldothealth / monkeypox

Mpox 2022 repository
Other
175 stars 36 forks source link

Inconsistency in German case counts: GH sometimes behind RKI website announcements #163

Closed corneliusroemer closed 1 year ago

corneliusroemer commented 2 years ago

After we worked out what's going on with UK spikes (see #161) I decided to have a look at German GH data since I know German case counts best.

There are three possible sources for German cases I can think of:

  1. RKI announcements on their public facing website
  2. SurvStat (database with more metadata)
  3. Newspaper/local announcements

Local announcements can be ahead of official RKI counts, which is why it's reasonable that GH is ahead of RKI case announcements early on.

Also, SurvStat can be ahead of website announcements, as cases can be dated earlier than they are announced.

Overall, it's clear that GH should always have at least as many cases as were announced by RKI on their website weekdaily.

Luckily, @micb25 has scraped the website every day since the beginning of this outbreak - so we have a CSV file of how many cases the RKI had announced for each day.

I compared this CSV with outputs from a GH pivot table filtered to Germany.

There are unfortunately some inconsistencies. It's tolerable for GH to sometimes be ahead of RKI announcements (as explained above) - but I don't see a reason why case counts up to a certain date should ever be behind website announcements.

I have highlighted these days in red in this Google Sheet: https://docs.google.com/spreadsheets/d/1fv86Qh6UCUTS8091QvxTYd9ftg9T_AV7QkJHSWatu0M/edit#gid=153130654

image

Apparently, in the last week or two you have started to use the exact counts from RKI's website - there's 0 discrepancy in cumulative counts.

It would be great if you could reconcile the two sources and make sure that GH is at least never behind RKI case counts.

I think my Google Sheet could be helpful.

Here's the link to @micb25's repo and data: https://github.com/micb25/RKI_Monkeypox/blob/main/data/RKI_Monkeypox.csv

If you have any questions, let me know.

In general, I think it could be helpful to have conventions for which data sources to use, and which confirmation date to use for each source - otherwise these "one-day-off" inconsistencies can result in annoying spikes in 7d moving averages.

Thanks for your work!

aimeehan1 commented 1 year ago

Hello @corneliusroemer. G.h has ended line-list data collection, last update 2022-09-22. See our newsletter update. https://globaldothealth.substack.com/p/tracking-the-2022-monkeypox-outbreak-8c3

No further edits to the G.h database will be made at this time. Your feedback regarding sourcing and timeliness of data will be helpful to consider as we update our sourcing strategy to capture lessons learned from the 2022 MPX outbreak. We appreciate your contributions!