corona-zahlen-landkreis / corona_landkreis_fallzahlen_scraping

Scraping Germany's local districts websites for newer corona-case-numbers!
GNU General Public License v3.0
17 stars 9 forks source link

more stable date parser #51

Closed dadosch closed 4 years ago

dadosch commented 4 years ago

idea: extract dateparsing into the helper, submit a string containing a date somewhere (and specify which occurence=0).

Remove all spaces and nonbreakingspaces. Go through a number of different regexes, if all fail, try "dateparser" https://dateparser.readthedocs.io/en/latest/

Any comments?/Feedback?

Date formats seen so far:

Stand 26.03.2020, 9.30 Uhr
Stand 26.03.2020
Stand: 26. März, 15 Uhr
Stand: 26.3.2020, 17.30 Uhr
Stand 26.03.2020 - 17:00 Uhr
Stand: 27. März 2020; 9.30 Uhr
…