Closed grgmiller closed 4 years ago
The system is really meant to run with the same years for 923 ad 860 -- we really haven't done any testing with different years in those two datasets (except insofar as the 2009-2010 data for 860 hasn't been integrated yet, and it is pulled in for 923), so I would recommend running the ETL for all of the years of data that you want to have access to all at once -- across 860, 923, and CEMS.
If you want to load all the years/states that are available within CEMS (or a big chunk of them anyway) I would recommend creating two data packages in the bundle -- one with 923 and 860, and another with 860, 923, and CEMS. Then you can load the EIA data into the SQLite DB, and convert the CEMS data to Apache Parquet files from the datapackge (with the epacems_to_parquet
script) since loading the entire CEMS dataset into SQLite takes an unknown amount of time that is longer than 30 hours (on my laptop at least). You can slurp the Parquet files into Pandas or Dask dataframes directly, and (since they were generated in the same ETL run as the EIA data) should be able to smoothly merge them with the EIA data as needed in you analysis / calculations.
I was including all years for EIA 860 due to the workaround in https://github.com/catalyst-cooperative/pudl/issues/467
However I can try re running the ETL with eia860_years: [2011,2012,2013,2014] or just eia860_years: [2014] and update if it works
It looks like from the settings file you included up there that only the 2014 EIA 923 data was included though. They're pretty entangled. Once we debug the 2009-2010 EIA 860 data we might just require the same years for both of them all the time. Dunno.
Describe the bug
When I run the ETL for data prior to 2016 (I got the same error for both 2014 and 2015), I get the following error:
Bug Severity
How badly is this bug affecting you?
To Reproduce
Steps to reproduce the behavior -- ideally including a code snippet that causes the error to appear.
In anaconda prompt, I ran:
(pudl) C:\Users\gmiller7\Box\PUDL>pudl_etl settings/2014_data.yml
The YAML file contains the following:
Expected behavior
I expected the ETL process to finish without error, like it did for the 2016 and 2017 data
Software Environment?
Additional context
I am running the ETL for an entire year at at time, for the entire US