catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Set up OpenVPN on Travis CI to allow FTP of ferc1 & epacems data #345

Closed zaneselvans closed 4 years ago

zaneselvans commented 5 years ago

The Travis CI build servers undergo dynamic load balancing constantly, which means the IP addresses of individual instances are always changing. This makes it impossible for anything to reliably connect to the outside world over FTP, which is why the ferc1 and epacems data have been impossible to download there for testing (not because Travis has been blacklisted by those agencies, whew!).

Unfortunately, neither FERC nor EPA supports access to these files over SFTP or HTTP, or HTTPS, any of which would work fine. Barring changes by those agencies, it is apparently possible to work around this limitation by setting up OpenVPN on the Travis CI build instance, and connecting to the outside world through that VPN tunnel. Not sure exactly what all we would need to do to make that work but maybe we can figure it out... This would be preferable to the current fake-data setup, since it would mean we were regularly testing the download of all the datasets, and wouldn't need to have the big honking fake data checked in to git.

More details from Travis CI here.

zaneselvans commented 4 years ago

In connection with #182 and potentially generating versioned archives of al this data at Zenodo, this may not be necessary at all -- iceboxing it for now.

zaneselvans commented 4 years ago

No longer relevant, as we're moving forward with versioned Zenodo archives as the inputs.