CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.14k stars 18.43k forks source link

For Germany the numbers don't match the official numbers #826

Open asmaier opened 4 years ago

asmaier commented 4 years ago

As of today for Germany the official numbers of total confirmed cases are 4838 (according to RKI, the german CDC): https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.html

However the board https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 shows a number of 5813.

What are your sources for Germany and how do you explain the difference of nearly 1000 cases ?

foursixnine commented 4 years ago

See page 2 of the Situationsberichte (It's also in English), and then read that foot note: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsberichte/2020-03-15-en.pdf?__blob=publicationFile

And also read the terms of use of this repo/board, and keep in mind that it is not something that itś updated in real time, there is some lag.

b2m9 commented 4 years ago

See page 2 of the Situationsberichte (It's also in English), and then read that foot note: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsberichte/2020-03-15-en.pdf?__blob=publicationFile

I actually assumed they just got a number wrong, so I was curious about your link. Yet, I can't see what you want to point out. RKI reports 4,838 manually reported cases and 4,195 electronically reported cases. What am I missing here?

Anyhow, I'm sure Johns Hopkins will correct the numbers once they are back at work.

pratik-bhandari commented 4 years ago

From the German CDC (RKI) Situationsberichte, page 2

A total of 4,838 (+1,043) laboratory-confirmed cases of coronavirus disease 2019 (COVID-19) have been detected in Germany since 27/01/2020, of which 4,195 were electronically reported to and validated at the RKI. So far, 12 (+4) deaths related to COVID-19 diseases were reported.

This gives a total of 4838+1043=5881 cases. This is the official number. The number reported in this repo is 5883.

What the number in the parenthesis (i.e. 1043) represents is not explained even in the RKI document, so it'd be better if only one of these is used in this repo: manually reported cases (4838), or electronically reported cases (4195) for the sake of clarity and consistency in the future as well :)

b2m9 commented 4 years ago

This gives a total of 4838+1043=5881 cases. This is the official number. The number reported in this repo is 5883.

What the number in the parenthesis (i.e. 1043) represents is not explained even in the RKI document

Really? Because they clearly state at the beginning that the blue numbers are the differences compared to the previous report. So it is 4838 in total as of 15 March, which is 1043 cases more than the previous report. Hence, the big fat number 4838 of total cases on the cover of the report.

From the report:

–Changes since the last report have been marked blue in the text

Correct me if I'm wrong but this is how I read it.

pratik-bhandari commented 4 years ago

Because they clearly state at the beginning that the blue numbers are the differences compared to the previous report.

That's right. Thank you for pointing it out.

1043 is, therefore, the increase in the number of cases from 03/14 to 03/15. This means that adding 1043 and 4838 to get a total of 5881 doesn't make sense! Thus the total number of cases remains 4838 (or 4195 if one considers electronically submitted cases only) and so does the issue here.

asmaier commented 4 years ago

As of today the official number according to RKI is 6012 (see also https://qap.ecdc.europa.eu/public/extensions/COVID-19/COVID-19.html) . WHO ist one day behind: 4838 (https://experience.arcgis.com/experience/685d0ace521648f8a5beeeee1b9125cd).

The German Wikipedia entry (https://de.wikipedia.org/wiki/COVID-19-Epidemie_in_Deutschland) lists the numbers from RKI: 6012 But the English Wikipedia entry (https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Germany) takes it numbers from https://interaktiv.morgenpost.de/corona-virus-karte-infektionen-deutschland-weltweit/ (which claims to also parse information from the german federal states health care ministries e.g. https://www.mags.nrw/coronavirus-fallzahlen-nrw), which shows a much higher number: 7.974

However John Hopkins list 7689 cases, which is again another number. And it is really unclear to me, where you get it from.

jgehrcke commented 4 years ago

Here I explain why the RKI numbers are sometimes affected by a 1-2 day delay from what the individual ministries of health in Germany actually publish: https://gehrcke.de/2020/03/covid-19-http-api-for-german-case-numbers/

That blog post also describes how to use an HTTP API for getting the current case count, based on zeit.de data.

Say at a certain moment in time you go around and ask all the individual ministries of health in Germany (well, you don't need to ask all of them, they have websites on their own and they sometimes publish the current data there) then you get a credible case count for that moment in time (this is what zeit.de seems to be doing).

asmaier commented 4 years ago

By the way, the zeit.de (a german newspaper) data can be retrieved from this URL: https://interactive.zeit.de/cronjobs/2020/corona/data.json This is probably the most up-to-date data for Germany that can be retrieved via an open API.

jgehrcke commented 4 years ago

This is probably the most up-to-date data for Germany that can be retrieved via an open API.

It's not "open", though, as it is an implementation detail in ZEIT ONLINE's architecture. We might or might not be allowed to use it, and they might change the implementation details at any time.

jgehrcke commented 4 years ago

If you'd like to have a look, here I announce an HTTP API that provides time series data for individual German states: https://gehrcke.de/2020/03/covid-19-http-api-german-states-timeseries/

https://github.com/jgehrcke/covid-19-germany-gae

Feedback welcome!

asmaier commented 4 years ago

@jgehrcke Are you affiliated with zeit.de ? I don't understand why you say the URL https://interactive.zeit.de/cronjobs/2020/corona/data.json is not open when it clearly is. If they would not want everybody to access that API they should hide it behind a firewall, or put a password or demand an API token. As of now this URL seems to be accessible by everyone in the world.

jgehrcke commented 4 years ago

@jgehrcke Are you affiliated with zeit.de ? I don't understand why you say the URL https://interactive.zeit.de/cronjobs/2020/corona/data.json is not open

Hey @asmaier -- I am not affiliated with ZEIT ONLINE. The URL it is publicly accessible, yes. But that's about it. And of course I am not telling that you should not use it. I was merely giving a heads-up, maybe a kind warning: an "open API" is something vastly different. An open API by more modern standards has the clear declaration of purpose and intent ("this is here for you to use!"), a clear interface specification, maintains interface stability, and uses and a data format that is easily usable from tools and tooling. I looked into their JSON document format quite a bit and of course parsing that is a bit of a pain. ZEIT ONLINE rather obviously built this only to be consumed by the mapping Javascript code in their website. It's rather obvious that it was not originally meant to be consumed by other parties.

Again, I have a strong opinion about what an "open API" is and maybe what not, but I am certainly not telling you what to do, sorry if I raised that impression :-).

asmaier commented 4 years ago

So I figured out a similar "Open API" as for ZEIT ONLINE for Berliner Morgenpost:

Historical case data for Germany, Europe and the World:

Latest data for countries and federal states in Germany:

Regional data for Germany:

The nice thing is, they store a source url for each data point.

jgehrcke commented 4 years ago

@asmaier these are indeed great data sources, look well-researched. With sources and timestamps.

In https://github.com/jgehrcke/covid-19-germany-gae I have changed the /now endpoint (https://covid19-germany.appspot.com/now) to be based on multiple sources now, including Berliner Morgenpost, and to report the more recent case count. Maybe that is useful. Cheers!