covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Feature: Add "caveats" for scrapers #414

Closed jzohrab closed 4 years ago

jzohrab commented 4 years ago

Original issue https://github.com/covidatlas/coronadatascraper/issues/707, transferred here on Tuesday Apr 07, 2020 at 18:54 GMT


Description

In some scrapers, we're making justifiable assumptions about how to interpret the data (e.g., #572 - KOR quarantines). For scrapers, we could hardcode these caveats in the scrapers, and perhaps include them in the source output, e.g.:

[
  {
    "county": "Los Angeles County",
    "state": "California",
    "country": "United States",
...
    "url": "http://www.publichealth.lacounty.gov/media/Coronavirus/",
    "cases": 0,
    "deaths": 0,
    "caveats": [
        "some_data_here"
   ],
...
  }
]

Perhaps these assumptions could be rolled up to the higher levels:

    "caveats": [
        "LA, CA: some_data_here",
        "PA: penn. caveats here"
   ]

Why do you need this feature or component?

Publicize assumptions

Notes

For testing/regression, I don't think we'd need to check the caveats field, as it might change over time. One sanity check would be enough.

jzohrab commented 4 years ago

(Transferred comment)

Yes! This came up also in the discussion of the Panama scraper, because the Panama granularity level is akin to "borroughs" (smaller than cities) and we don't have anyway to store that. So if we call them counties, that detail could go in a field like this.