corona-zahlen-landkreis / corona_landkreis_fallzahlen_scraping

Scraping Germany's local districts websites for newer corona-case-numbers!
GNU General Public License v3.0
17 stars 9 forks source link

Hello from covid-19-germany-gae! :) #39

Closed jgehrcke closed 4 years ago

jgehrcke commented 4 years ago

This looks neat. First of all, I'd love to say that the premise is agreeable, as in:

Die offiziellen Daten beim RKI und den Bundesländern sind zum Teil mehrere Tag alt. Was gibt es naheliegenderes, als diese Daten direkt von den Webseiten der Landkreise abzufragen? Dort sind sie "direkt an der Quelle" und am aktuellsten.

About that, I would actually appreciate to get your feedback on my article written here: https://gehrcke.de/2020/03/ard-zdf-covid-19-fallzahlen/

Do you agree with the details in there? I'd like to see a keen, critical eye on that. If we agree in all detail, let's start spreading this point of view a little more coherently.

Secondly, I would actually like to encourage you to write this README in English. Several nation-wide initiatives already suffer from the fact that they cannot be followed easily by internationals. About that please see https://github.com/jgehrcke/covid-19-germany-gae/issues/3 and the subsequent links. I am basically advocating for @Bost's point of view, who has relentlessly tried to connect people across nations.

Lastly for now, I would like to drag your attention to https://github.com/jgehrcke/covid-19-germany-gae/ itself. I think there is potential for your data source to become the primary data source for https://github.com/jgehrcke/covid-19-germany-gae/. Obviously quite a bit of work would need to happen towards that. Just want to mention what could be on the horizon for us when we collaborate.

Cheers,

Jan-Philip

jgehrcke commented 4 years ago

Oh, another thing I hope you have already seen. The discussion inhttps://github.com/CSSEGISandData/COVID-19/issues/1008. Quite insightful, I think.

dadosch commented 4 years ago

ok, the readme is in english now :)

On your article, I cannot assess on which data the JHU data is based, I assume they use some fancy ML process, but you have to keep in mind that even the data of each bundesland is multiple days old on each website (of the bundesland)...

jgehrcke commented 4 years ago

@dadosch thanks for the feedback!

but you have to keep in mind that even the data of each bundesland is multiple days old on each website (of the bundesland)...

I so far assumed "1 day delay" between Gesundheitsaemter and Landesministerium. "multiple days" is an interesting hypothesis. Of course this is not unlikely. But I would really appreciate if you could elaborate a little bit on this! Do you maybe have an example? Genuinely curious, because this topic is really important, and all of us deserve to understand a little better.

dadosch commented 4 years ago

@dadosch thanks for the feedback!

but you have to keep in mind that even the data of each bundesland is multiple days old on each website (of the bundesland)...

I so far assumed "1 day delay" between Gesundheitsaemter and Landesministerium. "multiple days" is an interesting hypothesis. Of course this is not unlikely. But I would really appreciate if you could elaborate a little bit on this! Do you maybe have an example? Genuinely curious, because this topic is really important, and all of us deserve to understand a little better.

one example: The table of the Landemsinisterium BW says for Ostalbkreis 113 cases (https://sozialministerium.baden-wuerttemberg.de/de/service/presse/pressemitteilung/pid/covid-19-zahl-der-infizierten-im-land-steigt-auf-3818/ xls at the very bottom), if you compare this with ostalbkreis' own website, this is data from approx. 2020-03-18 (there were 109 cases, see web.archive.org) so there are approx. 4 days delay here.

jgehrcke commented 4 years ago

For people reading along, let me make this example a little more tangible.

with ostalbkreis' own website

https://www.ostalbkreis.de/sixcms/detail.php?_topnav=36&_sub1=31788&_sub2=32062&_sub3=292448&id=292450

It says (while I write this) - 180 cases, time: 22.03.2020 (no hour of the day)

At the same time the state site says - 113 cases, time: 22.03.2020, 15:00

Adding screenshots for 'evidence'.

state level: 113 cases for Ostalbkreis (upper right corner)

Screenshot from 2020-03-23 16-03-46

LK level: 180 cases for Ostalbkreis

Screenshot from 2020-03-23 16-05-52


Question: on the LK level: are these exclusively those cases that got a positive polymerase chain reaction (PCR) test?

dadosch commented 4 years ago

Question: on the LK level: are these exclusively those cases that got a positive polymerase chain reaction (PCR) test?

I have no evidence to say otherwise; it wouldn't make any sense to call the people in quarantine/tested "Erkrankte" (180 would be too few),