covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Feature: Add "caveats" for scrapers #530

Open jzohrab opened 4 years ago

jzohrab commented 4 years ago

Description

In some scrapers, we're making justifiable assumptions about how to interpret the data (e.g., covidatlas/coronadatascraper#572 - KOR quarantines). For scrapers, we could hardcode these caveats in the scrapers, and perhaps include them in the source output, e.g.:

[
  {
    "county": "Los Angeles County",
    "state": "California",
    "country": "United States",
...
    "url": "http://www.publichealth.lacounty.gov/media/Coronavirus/",
    "cases": 0,
    "deaths": 0,
    "caveats": [
        "some_data_here"
   ],
...
  }
]

Perhaps these assumptions could be rolled up to the higher levels:

    "caveats": [
        "LA, CA: some_data_here",
        "PA: penn. caveats here"
   ]

Why do you need this feature or component?

Publicize assumptions

Notes

For testing/regression, I don't think we'd need to check the caveats field, as it might change over time. One sanity check would be enough.

shaperilio commented 4 years ago

Yes! This came up also in the discussion of the Panama scraper, because the Panama granularity level is akin to "borroughs" (smaller than cities) and we don't have anyway to store that. So if we call them counties, that detail could go in a field like this.