catalyst-cooperative / pudl-scrapers

Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.
MIT License
3 stars 3 forks source link

Develop high level script(s) for managing scraping/archiving #52

Closed zschira closed 1 year ago

zschira commented 2 years ago

The FERC datasets will need a script to manage scraping both the DBF and XBRL data. It may also be useful to create a single high level script for scraping data from all sources.

zaneselvans commented 2 years ago

We already depend indirectly on the click and typer CLI frameworks, and I think they both provide hooks for tab completion and hierarchical scripts, which might be useful in this context. I've often imagined having a hierarchical script for PUDL with unified help messages & interface like

$ pudl scrape ferc1 ferc2 ferc6 ferc60 ferc714
$ pudl archive ferc1 ferc2 ferc6 ferc60 ferc714
$ pudl datastore update-cache ferc1 ferc2 ferc6 ferc60 ferc714
$ pudl ferc2sqlite settings/ferc2sqlite.yml