catalyst-cooperative / pudl-scrapers

Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.
MIT License
3 stars 3 forks source link

Validate zipfiles and re-download if corrupted #39

Closed zaneselvans closed 1 year ago

zaneselvans commented 1 year ago

In creating this EIA-923 archive, everything seemed to go fine, but somehow the eia923-2020.zip file was corrupted or incomplete (even though it was about the right size) and I didn't catch it until I tried to run the ETL. The file from the EIA website seems fine, so it appears to have been a downloading glitch (the version I have locally downloaded by the scraper is corrupted too)

We should at least check that zipfiles we download are valid (without getting into the contents) to prevent this kind of thing from happening.

zschira commented 1 year ago

Closing see the new archiver repo