catalyst-cooperative / pudl-archiver

A tool for capuring snapshots of public data sources and archiving them on Zenodo for programmatic use.
MIT License
4 stars 2 forks source link

Archive FERC EQR pre-2013 #149

Open zschira opened 10 months ago

zschira commented 10 months ago

The existing EQR archiver only archives data from 2013-present. There is commented out code for archiving earlier years of data, but as of now it does not work. This is because these years of data is only available on an FTP server that has a global 3 user limit (see here for more info). As a workaround to this limit, the commented out code will attempt to ping the ftp server until it gets a successful response, then try to download the data. So far in testing, however, the archiver has not been able to successfully interface with the server even when running all night trying to ping it.

zaneselvans commented 4 months ago

I manually archived the 2002-2013 data, using a copy which I downloaded from FERC on 2022-01-24.

The Zenodo record is here: https://zenodo.org/records/10086109

Given that that archive was hand compiled and doesn't provide metadata or partitions via a datapackage.json file, maybe the easiest thing to do is use this Zenodo archive itself as the "source" for the early EQR data in an archiver which also pulls the latter data directly from FERC?