catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Values of fuel_mmbtu_per_unit in aggregated gf_eia923 are wrong #389

Closed zaneselvans closed 4 years ago

zaneselvans commented 4 years ago

Describe the bug

If you create a PudlTabl output object with freq="MS" and then use it to generate a gf_eia923 output dataframe, some of the values of fuel_mmbtu_per_unit are wrong. In particular, for records pertaining to natural gas, a significant fraction of them have values of about 0.5 instead of 1.0 -- it seems like they've been cut in half accidentally by the aggregation process.

Bug Severity

High: This is clearly just... wrong -- we're outputting bad data.

To Reproduce

import pudl
import sqlalchemy as sa
import matplotlib.pyplot as plt
pudl_settings = pudl.workspace.setup.get_defaults()
pudl_engine = sa.create_engine(pudl_settings["pudl_db"])
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine, freq="MS")
gf_eia923 = pudl_out.gf_eia923()
plt.hist(gf_eia923[gf_eia923.fuel_type_code_pudl=='gas'].fuel_mmbtu_per_unit, bins=100, range=(0,2));

bad_gas

Expected behavior

The distribution of aggregated fuel heat content should be very similar to that of the original data -- in this case for natural gas, it should be unimodal with a median value just a little bit above 1.0 mmBTU/mcf.

zaneselvans commented 4 years ago

In terms of units of overall heat content, it turns out the points at 0.1 mmBTU/mcf are much more important than the larger number of records (with very little heat content) at 0.5 mmBTU/mcf, and they show up across the frc_eia923 and bf_eia923 tables as well But also since one of the two columns involved is probably the one with the errors in it... who knows which one is more important in reality. Further investigation of the original data required

cmgosnell commented 4 years ago

Duplicate/subsumed within #391