covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Ability to report non-failure issues within a source's scraper #31

Open lazd opened 4 years ago

lazd commented 4 years ago

Description

It should be possible to report non-failure issues within a source's scraper:

Basically, anything we would log out with console.log or console.warn should be passed through some sort of reporter object so it can be surfaced in logs/dashboard.

Perhaps there could be categories of errors, a debug object could be passed, etc. Not sure, but it could look something like this:

report(report.STALE_DATA, { 
  updateDate: date,
  expectedDate: scrapeDate
});
jzohrab commented 4 years ago

Good thoughts, some questions:

My feeling is that scrapers should not do this, for reasons of atomicity. Scrapers just scrape, and that's it. I might be missing something from your question.

In terms of "totals from source differ" - is this for the scenario where the source data says something in its sum, but literally prints data that disagrees with itself? I can see that being the case.

The app should just dump stuff to the console. For logging and analysis, we'll probably want to have some kind of standardized message/warning data given, with codes, so that the log aggregation and rollup can find all of these things. Probably logging a warning with structured logging would be sufficient.

I think we should look more into logging, for sanity during operations and log aggregation etc. I'll open another issue for that.

jzohrab commented 4 years ago

Added https://github.com/covidatlas/li/issues/54 for more about logging.