catalyst-cooperative / pudl-archiver

A tool for capuring snapshots of public data sources and archiving them on Zenodo for programmatic use.
MIT License
4 stars 2 forks source link

Write archiver to regularly run and archive updated CEMS crosswalk #102

Open e-belfer opened 1 year ago

e-belfer commented 1 year ago

This is a continuation of issue #2505 in PUDL, which sets out to update the EPA-EIA crosswalk with 2021 data. The script written to do the updated archiving is written in PR #1 in the forked Catalyst repository.

While creating a static manually compiled output is a good start, it would probably be good to have a more reproducible programmatic process that will incorporate any data updates, and any updates to the crosswalk repo (this could be process changes or manual mapping additions), and that archives these outputs in a manner consistent to our other data sources.

zschira commented 1 year ago

I've updated the archiver to create a 2021 archive that just archives the zip of our fork that basically looks exactly like the 2018 archive. I think the functionality to dynamically generate the crosswalk would still be valuable, so I'm going to leave this issue open, but it probably won't be high priority for awhile, so I'm moving to the icebox.