Closed TrentonBush closed 1 year ago
My guess is that this is due to / related to #509 -- that Once Upon A Time we thought the sector code would come authoritatively from the EIA 860 and so preemptively dropped it from the EIA 923 data, rather than feeding it into the harvesting process. @knordback has been poking at this issue in #2169
I would guess that just by retaining the sector codes in all the input tables from 923 we would end up with much better coverage coming out of the harvesting step.
I can focus on this one if desired, as it's more targeted and sounds higher priority than other parts of #509
I think this would be a great targeted example. IIRC the EIA sector codes show up in almost every EIA-923 table that references plants.
Describe the issue
Many plants are missing sector categories and codes (eg IPP vs utility) despite this information existing in EIA923. This is because PUDL currently includes sector information only from EIA 860, which has different data coverage than EIA 923. This issue is most severe prior to 2009 (see table below).
For example, in 2009, the plant data in EIA860 includes only 6816 plants, whereas EIA923 covers 9907. Furthermore, the EIA860 sector codes include 203 missing values compared to 0 missing in EIA923. Prior to 2009, EIA 860 does not report sector categories at all.
Impact
I noticed this while comparing fuel cost records from the
fuel_receipts_costs
table to aggregate values from the EIA bulk electricity data. The bulk data is aggregated by sector (among other things). The missing sector codes mean that I cannot use PUDL to reproduce the bulk aggregates.Even in 2011 and beyond, when only about 34 plants (0.3%) are missing sector codes, those plants are disproportionately active in the fuel receipts records and produce 9869 entries (2.3%) with null sector codes.
To Reproduce
The SQL query to reproduce the table above is:
PUDL Version
This issue is present on the dev branch as of today (last commit hash: db5ce8f2b5ee062708d4700fe6a6391c23a1005f)