catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

N/A heat rates for plants than only report gen/fuel in one month of the year #2062

Open gschivley opened 1 year ago

gschivley commented 1 year ago

Describe the bug

A number of small plants appear to report all of their generation/fuel consumption for at least some units in a single month of the year (e.g 10805, 50748, 50850, 56356, 57666, and 58161). When calculating unit heat rates via pudl_out, the generation and boiler fuel have the function pudl.helpers.sum_na applied. This function returns N/A when any of the values in a year are N/A. With all fuel/generation reported in a single month, it isn't possible to get valid data for these plants.

Bug Severity

Medium: The units tend to be small and I can generally work around it. But it was a pain to figure out why some units don't have a valid heat rate in any year of the data set.

To Reproduce

pudl_out.bf_eia923().query("plant_id_eia==10805")
pudl_out.gen_eia923().query("plant_id_eia==10805")

Expected behavior

I'm not sure how to best generalize, but if both the generation and boiler fuel are not reported in a month (or in the first 11 months of a year?) then drop them and only keep the valid months of data before aggregating.

gschivley commented 1 year ago

Forgot to include above. This is using v0.5.0 of the software and database.

❯ conda list pudl
# packages in environment at /opt/miniconda3/envs/powergenome:
#
# Name                    Version                   Build  Channel
catalystcoop.pudl         0.5.0              pyhd8ed1ab_3    conda-forge