corona-zahlen-landkreis / corona_landkreis_fallzahlen_scraping

Scraping Germany's local districts websites for newer corona-case-numbers!
GNU General Public License v3.0
17 stars 9 forks source link

A note on data inconsistencies #44

Closed dadosch closed 4 years ago

dadosch commented 4 years ago

I have noticed that there are data inconsistencies for example in Soest's kommunen. These do not come from wrong parsing, but from changed official numbers.

We just use the reported number, it doesn't matter if they go down in between or at the same date(if they go down in the official numbers) It is up to the data consumer how to handle these things.

dadosch commented 4 years ago

Also sometimes a time is included, sometimes not.

Do not add wrong, not existant times (like 12:00 or 00:00)

Do not add timezone data, always use lokal time in format %Y-%m-%d %H:%M or %Y-%m-%d

dadosch commented 4 years ago

If some LK changes it website, so it becomes unparsable, the corresponding csv will not be deleted, but not updated automatically