corona-zahlen-landkreis / corona_landkreis_fallzahlen_scraping

Scraping Germany's local districts websites for newer corona-case-numbers!
GNU General Public License v3.0
17 stars 9 forks source link

Handle parsing error #70

Open dasmur opened 4 years ago

dasmur commented 4 years ago

Component crawler

Problem I tried to get an overview of current status of the crawler regarding the number of successfully parsing districts. By randomly running some scripts, I already noticed some scripts which are not able to extract the current numbers of cases (probably due to changes to the corresponding website structure). In order to identify failing scripts, it would be nice to have some kind of common error signalling.

Suggestion My first idea is based on UNIX exit codes, by simply return 1 if the parser is not able to extract the data.

I already included this approach in one script which I will link to this issue.

dasmur commented 4 years ago

Of course, even within this script, there could occur parsing errors in subsequent parts of the code, but it should be enough to get the idea.

If this would be introduced into all scripts, it would be quite easy to get an overview.

dasmur commented 4 years ago

Ok, while my suggestion (using UNIX exit codes to early exit failing parsers) might be a good thing to improve the overall coding style, it is not really necessary to answer the question:

Q: How many scripts are currently able to extract district case numbers?

The answer is 23 of the 62 are running without errors (defined by an exit code of 0) or in other words, currently 39 parsers are failing with an exit code of 1.