Add retry capabilities to scraping to verify that downloaded archives are valid

catalyst-cooperative / pudl-scrapers

Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.

MIT License

3 stars 3 forks source link

Add retry capabilities to scraping to verify that downloaded archives are valid #51

Closed zschira closed 1 year ago

zschira commented 1 year ago

Some data sources (like EPA CEMS) are prone to creating bad archives due to flaky downloads. Currently the only solution to this is to manually verify archives and rerun the scraping process as needed. To enable the automation of our scraping/archiving, we need to add automated retry capabilities that will verify the archives and rerun the scraping process if it fails.

zaneselvans commented 1 year ago

There's also #39 which is related to this.

zschira commented 1 year ago

Ah ok. I'm going to close this one and add that one to the epic.