catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Set up continuous integration #147

Closed karldw closed 5 years ago

karldw commented 6 years ago
karldw commented 6 years ago

Some of the tests take a ton of memory, but I'm guessing they won't be in the fast subset.

karldw commented 6 years ago

FYI: Travis can cache things for you.

zaneselvans commented 6 years ago

I have TravisCI set up and working. Currently, it will run any test marked ci and the only test so marked at the moment is a Hello World. Now we need to start creating a suite of tests that can actually be run on Travis and other times when we just want to test our code, and not load the entire universe of data.

What do we think that looks like? Could we do one year's worth of data, or is that still too much? Maybe one year of FERC & EIA, plus one year of one state's worth of CEMS? It looks like the Travis platform has network access -- would we be allowed to just have the test download fresh data directly from the federal sources, rather than having it checked in to the repository? Not sure what constitutes reasonable use on Travis. That would have the advantage of also testing the download / datastore process.

zaneselvans commented 6 years ago

The Travis CI builds are working. Rather than categorize some tests as fast, I created (at least for now) a separate travis_ci_test.py module, which runs the ETL on a small subset of all the data -- which is downloaded to a local datastore on the Travis VM. Right now it does FERC Form 1 & EIA860 for 2012 and 2016, and EIA923 EPA CEMS for just 2016. In addition, the CEMS only pulls Colorado's data, to minimize the time the whole thing takes to run, disk use, etc.

karldw commented 6 years ago

Great! The only advantage of adding AppVeyor would be also testing on Windows.

For the simpler functions, we could add tests with dummy datasets.