covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Cache-only scraper #407

Closed jzohrab closed 4 years ago

jzohrab commented 4 years ago

Original issue https://github.com/covidatlas/coronadatascraper/issues/671, transferred here on Sunday Apr 05, 2020 at 17:34 GMT


Description

Develop a scraper that can ingest a source list and do nothing with it other than cache.

Why do you need this feature or component?

This would allow non-technical volunteers to vet and contribute sources from around the world so that we can start caching them. Many sources don't have time series data so it's a "race against time" if we want to eventually have temporal data for everything.

Additional context

As per @chunder's suggestion, I started a spreadsheet (WIP) that this scraper would draw from.