catalyst-cooperative / pudl-archiver

A tool for capuring snapshots of public data sources and archiving them on Zenodo for programmatic use.
MIT License
4 stars 2 forks source link

Specify archiver success & failure conditions #70

Open zaneselvans opened 1 year ago

zaneselvans commented 1 year ago

Our goal is to have the archivers running on an automated schedule in the background, taking snapshots of the original data sources which can be accessed programmatically. This will minimize the overhead associated with keeping our raw inputs up to date, but we still need the system to alert us when something goes wrong so we can fix it.

- [ ] https://github.com/catalyst-cooperative/pudl-archiver/issues/71
- [ ] https://github.com/catalyst-cooperative/pudl-archiver/issues/72
- [ ] Define `eia860` archiver success conditions
- [ ] Define `eia860m` archiver success conditions
- [ ] Define eia861` archiver success conditions
- [ ] Define `eia923` archiver success conditions
- [ ] Define `eia_bulk_elec` archiver success conditions
- [ ] Define `eiawater` archiver success conditions
- [ ] Define `epacamd_eia` archiver success conditions
- [ ] https://github.com/catalyst-cooperative/pudl-archiver/issues/214
- [ ] Define `ferc1` archiver success conditions
- [ ] Define `ferc2` archiver success conditions
- [ ] Define `ferc6` archiver success conditions
- [ ] Define `ferc60` archiver success conditions
- [ ] Define `ferc714` archiver success conditions
- [ ] Define `mshamines` archiver success conditions
- [ ] Define `phmsagas` archiver success conditions
zschira commented 1 year ago

I think having success/failure conditions like this is a really good idea for making automated archives useful and hopefully catch errors early.

In general we expect the set of data partitions to either remain constant or grow over time

I think we could probably fail or at least require human review any time we would delete a partition outright as that's almost always unexpected.

Another thing we should probably start considering is some procedures for handling failures. For example, we should plan some sort of human intervention mechanism if we deem an archive to actually be acceptable even if it does generate a failure.