catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Excessively high mercury content in coal #390

Closed zaneselvans closed 4 years ago

zaneselvans commented 4 years ago

Looking at the EIA 923 Fuel Receipts and Costs table for 2009-2018, we've found that there's about 500 million tons of coal deliveries distributed across ~11,100 reported deliveries that are reported with an implausibly high mercury content of 9.0 ppm. It first shows up in January 2012, and last appears in February 2016, but virtually all of the bad entries are for deliveries made in 2012. The records affect ~150 utilities from all over the US, in deliveries from more than 500 different mines. The hard start and stop dates and lack of any kind of association with a particular utility/mine/plant etc. makes me suspect it's not a reporting error, or at least, not a normal reporting error -- and that it might be a data entry or processing error on the EIA side? For example, check out the EIA 923 Fuel Receipts and Costs data for Plant ID 3 (Barry) for 2012.  Every listed coal delivery has 9.0ppm of mercury, while The USGS FS095-01) suggests that US coal has at most something like 0.25ppm mercury.

Currently this causes the frc_eia923 data validation for coal mercury content to xfail.

pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine=pudl_engine)
frc_eia923 = pudl_out.frc_eia923()
bad_hg = frc_eia923.query("mercury_content_ppm>=1.0")
plt.hist(bad_hg.report_date, range=("2011-01-01", "2018-12-31"), bins=97, weights=bad_hg.fuel_qty_units)
plt.ylabel("Coal delivered [tons]")
plt.title("Coal w/ Hg reporting errors (Hg >= 1.0ppm) in EIA 923")
plt.savefig("bad_hg.png");

bad_hg

zaneselvans commented 4 years ago

Wrote to Laura Martin at EIA about this issue on 2019-10-15.

zaneselvans commented 4 years ago

EIA has confirmed that there's a lack of QA on some of the older coal mercury data, and these 40x higher than would be reasonable numbers are almost certainly just bad data. Rather than confuse folks or generate bad analyses, I propose that we replace any value greater than 1.0ppm (which is still ~4x the max Hg content listed by USGS for any US coal basin) with an NA value.