CivicActions / edscrapers

US Department of Education Data Scraping Kit; see https://us-ed-scraping.ckan.io/dataset
GNU Affero General Public License v3.0
15 stars 9 forks source link

Refactor entry points, logging and I/O standards #51

Closed nightsh closed 4 years ago

nightsh commented 4 years ago

CLI entrypoint changed to a new ed binary. Sample call, after installing the package:

$ ed
Usage: ed [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  compare    Run a comparison algorhitm against AIR's resources.
  dash       Run the dash server for displaying HTML statistics.
  scrape     Run a Scrapy pipeline for crawling / parsing / dumping output
  stats      Run a statistics algorhitm on the data extracted to provide...
  transform  Run a transformer on a scraper output to generate data in a...

$ ed scrape -h
Usage: ed scrape [OPTIONS] NAME

  Run a Scrapy pipeline for crawling / parsing / dumping output

Options:
  --cache / --no-cache    Do not use Scrapy cache (i.e. "live" scrape)
  --resume / --no-resume  Resume a previously interrupted scrape
  -v, --verbose           Show INFO and DEBUG messages.
  -q, --quiet             Do not show anything.
  -o, --output PATH       If specified, pipe the output to both STDOUT and the
                          file specified.

  -h, --help              Show this message and exit.

TODO:

osahon-okungbowa commented 4 years ago

ok from my end. Let's give it a production run