NREL / foundational-industry-energy-data

The Foundational Industry Energy Dataset (FIED) is a unit-level characterization of energy use in the U.S. industrial sector.
https://nrel.github.io/foundational-industry-energy-data/
2 stars 0 forks source link

Investigate and fix rogue GHG estimates #10

Closed calmc closed 3 weeks ago

calmc commented 1 month ago

There are instances where the FIED reports NEI-related GHG emissions, but not associated NEI-related energy estimates. It isn't clear where these data are coming from, if they're calculated or being pulled directly from the underlying NEI data. The GHG estimates are also way too large.

Here's an example of the issue for units in California (but note that the issue may not be limited to California facilities): image

calmc commented 3 weeks ago

This bug was traced back to the check_estimates method. This method flags the NEI-derived energy values that exceed the maximum energy value derived from industrial facilities reporting to the 2017 GHGRP (~7.924e10 MJ). As written, the method was first removing all energy estimates for a unit with any derived energy value above the GHGRP max. These units were then populated by secondary estimates calculated using the unit's maximum design capacity (if available), or using the minimum derived energy estimate (assuming it was less that the max GHGRP value). Not all units were covered by these approaches, however, and the final dataset left these as NaNs. Additionally, the method was not re-adjusting the GHG estimates, many of which are calculated using the original derived energy values.

The recommendation is to stop using the check_estimates method and focus attention on the emissions factors (or other aspects of the reported NEI data) that may be the source of such large derived energy values. Problems with the underlying data should be caught earlier in the calculation process and addressed before they are spread to other results (e.g., GHG estimates).

Although there are only 537 units (corresponding to 165 facilities) flagged for their excessive derived energy values, they do have an outsized effect on not only aggregated energy and GHG emissions values, but also confidence in the approach and final estimates themselves.