catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Net Generation data missing from generation_eia923.csv datapackage #595

Closed grgmiller closed 1 year ago

grgmiller commented 4 years ago

I was working on https://github.com/catalyst-cooperative/pudl/issues/245 and attempting to calculate gross to net generation ratios using gross generation data from cems and net generation data from eia-923, but I noticed that there is missing data in the generation_eia923.csv data package. I was looking at plant_id_eia = 55397, which has three generators: ST1, CT1, and CT2. Data for these three generators is present in generation_eia923 up to 2017, but the 2018 data only includes data for ST1. I went back and checked the original 2018 EIA-923 excel file, and it contains data for all three generators.

It seems like for some reason, the ETL process might be dropping some of the good data. Looking at the generation function in the eia923.py transform module, I do not see an obvious reason why these records would have been dropped, as the generator_id is not missing, and it has non-zero data available.

I noticed this first while using the v1.0 datarelease of pudl-eia860-eia923-epacems, but downloaded v1.1 of pudl-eia860-eia923-epacems and the problem persists there as well. I have not tried using pudl-eia860-eia923, but I am assuming that the eia923 data should be the same in both?

zaneselvans commented 4 years ago

Are you saying that the net generation for those generators are 0.0, or null? That does seem weird. Let me play with the live DB a little bit and see what it looks like.

grgmiller commented 4 years ago

@zaneselvans What I'm saying is that there is no record at all for generators CT1 and CT2 in the 2018 datapackage, even though they both are present in the raw 923 data. It is as if those generators were dropped somehow.

grgmiller commented 3 years ago

I've been looking into this a bit more and there seems to be a couple of issues with the data:

1) Sometimes in EIA-923 a plant will appear in "Page 1 Generation and Fuel Data" with nonzero net generation data by fuel, but it is missing from "Page 4 Generator Data" (for example plant 118 in 2017). Not sure if this is an existing data issue that you are aware of with the EIA-923 raw data?

2) Sometimes a plant appears in both "Generation and Fuel Data" and "Generator Data" in the raw data file with non-zero net generation, but for some reason it does not make it to the generation_eia923 data package file. Not sure if this issue has been explored further since May, but if not, I can try to take a look at what's going on.

MichaelTiemannOSC commented 3 years ago

Plants 493 and 508 make for an interesting comparison using the 2012 EIA-923 spreadsheet report. Plant 493 represents its generation in both the "Page 1 Generation and Fuel Data" and the "Page 4 Generator Data" and the net generation between the two pages sums correctly, though Page 1 adds up only the two fuel types driving all the Prime Movers, while Page 4 adds up 3 generators (1, 2, and 3) irrespective of fuel type. Plant 508 has WND generation data on Page 1 that is not present on Page 4, and the net generation of 508 does not tie between the two pages.

I think that 'generation_eia923' is a bit of a stretch--it is called "Generators" in the 923 spreadsheet, and it looks like generaTION data is secondary to its generaTOR purpose.

In any case, the documentation should definitely call out which should be used for which purpose, and it would be good to see how, as this project matures, it can synthesize a true best-of-breed rendering of this primary data for the widest range of valid use cases.

zaneselvans commented 3 years ago

We definitely need to compile more detailed table-level metadata and incorporate it into the data dictionary so that it's clear to everyone what they should expect to find in each table. The only unique information reported in the "generaTOR" table in EIA 923 is the "generaTION" -- the rest of it is generator attributes whose home is actually the generator and plant and utility tables, which comes from the EIA 860, which is why we renamed it generaTION. The generator attributes reported there (and everywhere else too...) are harmonized late in the ETL process so that we can iron out any inconsistently reported values, and in the end the only unique information that is extracted from this EIA 923 page is the monthly generation per generator. But as you've noticed, a large portion of the generation isn't actually reported there (IIRC it's around 40% of the total MWh that are "missing" -- because not all generators meet the reporting requirements for that part of the form) whereas the generation_fuel_eia923 table (and the Generation and Fuel Data page in the EIA 923 spreadsheet) reports much more generation... but not (insert scream here) on the basis of generator. Instead it's broken down by plant, prime mover, and fuel. Helpfully it is broken down by fuel which goes to electricity production, and fuel which goes to high temperature process heat via CHP systems.

The only plants whose net generation should match exactly between the two tables are those for which ALL of their generators meet the reporting requirements for the generation_eia923 table, and also don't have any associated CHP.

And then on top of all that, the Generation and Fuel Data spreadsheet reports "total fuel consumption" in terms of heat content (MMBTU), even when it's not a thermal plant -- wind, PV, hydro, etc. -- using some arbitrary average fossil plant thermal efficiency to convert the energy output into an "equivalent" heat input.

zaneselvans commented 1 year ago

Welp, not 100% sure what ultimately fixed this, but this data is thankfully showing up just fine now.