Closed grgmiller closed 4 years ago
This potentially seems related to https://github.com/catalyst-cooperative/pudl/issues/351
Hey @grgmiller, the code is using the EIA 860 data to calculate the utc_offset
. A workaround here is to use a wider range of eia860_years
in your settings (e.g. [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018]
).
Ref #178 (plant matching) and #250 (original issue about UTC)
Thanks for the suggestion @karldw
I ran the same code again, but this time edited the settings so that eia923_years
,eia860_years
, and epacems_years
are all set to [2016,2017]
At the same point in the process, however, this time when performing ETL for the 2016 CEMs file, I get the same valueerror.
2019-11-05 18:58:40 [ INFO] pudl.extract.epacems:64 Performing ETL for EPA CEMS hourly FL-2016-12
2019-11-05 18:58:46 [ INFO] pudl.load.csv:97 ===================== Dramatic Pause ====================
2019-11-05 18:58:46 [ INFO] pudl.load.csv:99 Loading 5,597,592 records (790 MB) into PUDL.
2019-11-05 19:00:52 [ INFO] pudl.load.csv:106 ================ Resume Number Crunching ================
Traceback (most recent call last):
File "C:\Users\gmiller7\anaconda3\envs\pudl\Scripts\pudl_etl-script.py", line 9, in <module>
sys.exit(main())
File "C:\Users\gmiller7\anaconda3\envs\pudl\lib\site-packages\pudl\cli.py", line 99, in main
clobber=args.clobber)
File "C:\Users\gmiller7\anaconda3\envs\pudl\lib\site-packages\pudl\etl.py", line 790, in generate_data_packages
pkg_tables = etl_pkg(pkg_settings, pudl_settings, pkg_bundle_dir)
File "C:\Users\gmiller7\anaconda3\envs\pudl\lib\site-packages\pudl\etl.py", line 722, in etl_pkg
pkg_dir=pkg_dir
File "C:\Users\gmiller7\anaconda3\envs\pudl\lib\site-packages\pudl\etl.py", line 398, in _etl_epacems_pkg
pkg_dir))
File "C:\Users\gmiller7\anaconda3\envs\pudl\lib\site-packages\pudl\etl.py", line 363, in _etl_epacems_part
for transformed_df_dict in epacems_transformed_dfs:
File "C:\Users\gmiller7\anaconda3\envs\pudl\lib\site-packages\pudl\transform\epacems.py", line 252, in transform
.pipe(fix_up_dates, plant_utc_offset=plant_utc_offset)
File "C:\Users\gmiller7\anaconda3\envs\pudl\lib\site-packages\pandas\core\generic.py", line 5028, in pipe
return com._pipe(self, func, *args, **kwargs)
File "C:\Users\gmiller7\anaconda3\envs\pudl\lib\site-packages\pandas\core\common.py", line 483, in _pipe
return func(obj, *args, **kwargs)
File "C:\Users\gmiller7\anaconda3\envs\pudl\lib\site-packages\pudl\transform\epacems.py", line 55, in fix_up_dates
f"utc_offset should never be missing for CEMS plants, but was "
ValueError: utc_offset should never be missing for CEMS plants, but was missing for these: [55422]
Do I need to try expanding my year range even more, or is there another workaround available?
I would do all the available years of EIA 860 -- 2011-2017. Many of the static entity (plants, generators, utilities) have inconsistently reported values across years, and we set a consistency threshold for those values, below which they get set to NaN. So if one of the location fields that's being used to infer timezone (and thus UTC offset) is too inconsistent the offest will be unavailable.
We need to make this all more robust -- and really the expectation is that most users will just download the pre-compiled data... once we're publishing it on a regular basis, and not run the whole involved ETL process themselves. But we're not quite there yet.
Also note that you can expand the set of EIA 860 years w/o expanding the years for 923 or CEMS if you don't want to.
Thank you @zaneselvans that seems to have done the trick! I edited my settings file to include the following settings:
eia923_years: [2017]
eia860_years: [2011,2012,2013,2014,2015,2016,2017]
epacems_years: [2017]
I now have a valid data package for all epacems data for 2017.
Just to note, this chain helped me as I ran into this issue too. :) Thank you!
Describe the bug
When running pudl_etl to create a database of CEMS data from 2017, I get the following error: ValueError: utc_offset should never be missing for CEMS plants, but was missing for these: [55422]
Bug Severity
High: This bug is preventing me from using PUDL.
To Reproduce
In anaconda prompt, I enter pudl_etl settings/2017-data.yml Text of the setttings file and console output is attached: issue_output.txt settings file text.txt
The process gets through the ETL for FL-2017-12, takes a dramatic pause, and as soon as it resumes number crunching, it throws the error.
Expected behavior
I expected pudl_etl to create a complete datastore of CEMS data
Software Environment?
Additional context
I have only downloaded eia860, eia923, and epacems, not ferc1 or epaipm