CLI entrypoint changed to a new ed binary. Sample call, after installing the package:
$ ed
Usage: ed [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
compare Run a comparison algorhitm against AIR's resources.
dash Run the dash server for displaying HTML statistics.
scrape Run a Scrapy pipeline for crawling / parsing / dumping output
stats Run a statistics algorhitm on the data extracted to provide...
transform Run a transformer on a scraper output to generate data in a...
$ ed scrape -h
Usage: ed scrape [OPTIONS] NAME
Run a Scrapy pipeline for crawling / parsing / dumping output
Options:
--cache / --no-cache Do not use Scrapy cache (i.e. "live" scrape)
--resume / --no-resume Resume a previously interrupted scrape
-v, --verbose Show INFO and DEBUG messages.
-q, --quiet Do not show anything.
-o, --output PATH If specified, pipe the output to both STDOUT and the
file specified.
-h, --help Show this message and exit.
Add new Click powered CLI
Remove old CLI entry points, one CLI to rule them all
Add loguru logger (WIP)
Extract all the params from modules into function arguments
TODO:
[x] Replace hardcoded values with extracted params
[x] Replace all print() calls with logger calls
[x] Replace all output paths to be children of ./output
[x] Extract output path into an env var
[x] Move all scraper output to ./output/scrapers to avoid filtering the dir globs
[x] Move all transformer output to ./output/transformers
[x] Move all [file] logging to ./output/logs
[x] Scrapy HTTP cache and jobs are stored within output path
~All output defaults to STDOUT and takes an optional -o argument to also dump into a file~ (won't implement this now)
[x] All input has defaults and takes an optional -i argument to specify (or better, defaults to STDIN)
CLI entrypoint changed to a new
ed
binary. Sample call, after installing the package:TODO:
print()
calls withlogger
calls./output
./output/scrapers
to avoid filtering the dir globs./output/transformers
./output/logs
-o
argument to also dump into a file~ (won't implement this now)-i
argument to specify (or better, defaults to STDIN)