dkobak / covid-underdispersion

Testing reported Covid-19 deaths and cases for Poisson underdispersion
12 stars 0 forks source link

What about running it on JHU CSSE data? #1

Open Artoria2e5 opened 2 years ago

Artoria2e5 commented 2 years ago

Thanks for the banger article! The good people at John Hopkins CSSE also maintain a COVID dataset. Among other things, it is:

dkobak commented 2 years ago

Thanks, it's a good question.

So in general I prefer WHO data to JHU data because WHO uses the data by date of death as opposed to date of reporting, for some countries (mostly European) that share these data. Here is a figure from last year that illustrates that:

who_vs_jhu_daily

Look e.g. at Sweden: JHU has a lot of within-week fluctuations because Sweden reports fewer deaths on each weekend. But WHO uses data by date of death, so there is no within-week fluctuations at all. Clearly WHO data are better.

That said, Boudewijn Roukema pointed out to me recently that for some other countries JHU data are better and less jumpy because it seems that on some days WHO simply skipped one day and did not update the time series, so they report 0 deaths on that one day. One example is Algeria. Here is WHO:

Screenshot from 2022-03-15 12-07-26

While here is JHU (unsmoothed):

coronavirus-data-explorer(7)(1)

Here clearly JHU has a better data.

My take on that is that the best approach may be be to grab both data sources and then choose the less noisy source for each country... But I have not tried that yet.


Regarding sub-country-level data, I did the analysis on US states and Russian federal regions (see my Python notebook), grabbing the data from CDC and Russian authorities directly.


I am re-opening this issue, because I think it'd be great to run it on JHU and see what happens.