catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Create test for the epacems_to_parquet conversion script #351

Closed zaneselvans closed 4 years ago

zaneselvans commented 4 years ago

Along the lines of #260, we need to test the code that underlies the epacems_to_parquet console script that we are distributing. An initial attempt which tries to convert 1 year worth of Idaho data has been created and lives inside test/etl_test.py

It works when run locally against a complete PUDL database, but it fails on Travis, and also when run locally as part of tox -v -e etl -- --fast (currently only available on the python-packaging branch) complaining about not being able to find five plant IDs, in fix_up_dates:

ValueError: utc_offset should never be missing for CEMS plants, but was missing for these: [7456, 7953, 55179, 55733, 57028]

The epacems transform step depends on having the PUDL Database available, in order to read out the timezone information associated with plants listed in CEMS for UTC timezone correction, but for some reason this seems to be failing when there's only one year of data in the PUDL database... more debugging is required on this. @karldw do you have any ideas off the top of your head?

swinter2011 commented 4 years ago

testing

swinter2011 commented 4 years ago

sync test