covidatlas / coronadatascraper

COVID-19 Coronavirus data scraped from government and curated data sources.
https://coronadatascraper.com
BSD 2-Clause "Simplified" License
363 stars 179 forks source link

--location does not run just a single scraper, but all #116

Closed jgehrcke closed 4 years ago

jgehrcke commented 4 years ago

Current docs:

Run only one scraper To scrape just one location, use --location/-l yarn start --location "Ventura County, CA, USA"

I wanted to test the scraper that I am currently trying to add (DEU'):

$ yarn start --location "DEU"
yarn run v1.22.4
$ NODE_OPTIONS='--insecure-http-parser' node cli.js --location DEU
(node:15641) ExperimentalWarning: The ESM module loader is experimental.
⏳ Scraping data for today...
  🐢 Cache miss for https://covid19-germany.appspot.com/now at coronadatascraper-cache/2020-3-17/38f8f7262afb9fab969c26940d40bca8.json
  🚦  Loading data for https://covid19-germany.appspot.com/now from server
(node:15641) Warning: Using insecure HTTP parsing
  ⚡️ Cache hit for https://opendata.arcgis.com/datasets/d14de7e28b0448ab82eb36d6f25b1ea1_0.csv from coronadatascraper-cache/2020-3-17/ea58672b23e340fbf8f209b7af40173c.csv
  ⚡️ Cache hit for https://opendata.arcgis.com/datasets/969678bce431494a8f64d7faade6e5b8_0.csv from coronadatascraper-cache/2020-3-17/cf91be8fb40e473d8ff941f6dd1a8a88.csv
  ⚡️ Cache hit for https://opendata.arcgis.com/datasets/8840fd8ac1314f5188e6cf98b525321c_0.csv from coronadatascraper-cache/2020-3-17/8bd70519a35eb981737cd7daa556178a.csv
  ⚡️ Cache hit for https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection.html from coronadatascraper-cache/2020-3-17/0ab962e16615857354fc29aa8f09bc3f.html
  ⚡️ Cache hit for https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv from cache/e20883430a9a4c7502d0a9618e49c1a9.csv
  ⚡️ Cache hit for https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv from cache/ead40cbf6519cc41c790c692e8cdf151.csv
  ⚡️ Cache hit for https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv from cache/a5f9d8d23a20f71890a61b119183a5fb.csv
  ⚡️ Cache hit for https://raw.githubusercontent.com/openZH/covid_19/master/COVID19_Fallzahlen_Kanton_ZH_total.csv from cache/6a6fd96879ca615faf494dfc19c52224.csv
  ⚡️ Cache hit for https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni.csv from cache/cf1e3fa5950ef69383000f685b0597d0.csv
...

Here I interrupted the process. Is that because my scraper is special in the sense that it only defines country, and no state and county?

If I can help addressing that let me know.

lazd commented 4 years ago

@jgehrcke fixed! Give it a try.

lazd-desktop:scraper lazd$ yarn start -l "WA, USA"
yarn run v1.22.0
$ NODE_OPTIONS='--insecure-http-parser' node cli.js -l 'WA, USA'
(node:97160) ExperimentalWarning: The ESM module loader is experimental.
⏳ Scraping data for today...
  ⚡️ Cache hit for https://www.doh.wa.gov/Emergencies/Coronavirus from coronadatascraper-cache/2020-3-17/7a0ebaf089c7afc655fee3726d602346.html
✅ Data scraped!
   - 0 cities
   - 1 states
   - 19 counties
   - 0 countries
ℹ️  Total counts (tracked cases, may contain duplicates):
   - 1808 cases
   - 0 tested
   - 0 recovered
   - 96 deaths
   - 1712 active
⏳ Generating features...
  ⚠️  Skipping (unassigned), WA, USA because it's unassigned
✅ Found features for 19 out of 20 regions for a total of 19 features
⏳ Getting population data...
  ❌ (unassigned), WA, USA: ?
✅ Found population data for 19 out of 20 locations
✏️  dist/data.json written
✏️  dist/data.csv written
✏️  dist/features.json written
✏️  dist/report.json written
✏️  dist/ratings.json written
✨  Done in 0.96s.
jgehrcke commented 4 years ago

Thank you Larry!