catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Integrate respondent frequency in EIA-923 tables #1757

Open grgmiller opened 1 year ago

grgmiller commented 1 year ago

Certain plants only report data to the generator table and boiler fuel table on an annual basis. In this case, their annual total is reported as a single value in December, and the other 11 months are reported as missing values. Previously, this annually-reported data was only being used to allocate generation and fuel data in December, instead of for the entire year. The raw EIA-923 table includes a column for 'Respondent Frequency' (which includes a code of 'A' for annual and 'M' for monthly). In theory this column could be used to identify these plants, but but this column is not currently included in pudl.

Whaaat? Well we should add that. Where does it show up in the spreadsheets? What entity is it associated with (generator, plant, utility?)

That's wild that there's some December-only records for the whole year. How annoying. I wonder what all other tables have this dynamic going on.

Looks like it shows up in at least:

Is a substantial fraction of overall fuel / generation being reported annually?

To deal with this correctly it seems like we'd really need to break out the annual and monthly reporting into separate tables that have different temporal resolution.

Originally posted by @zaneselvans in https://github.com/catalyst-cooperative/pudl/issues/1608#issuecomment-1183913344

Tasks (to turn into issues...)

grgmiller commented 1 year ago

According the the EIA-923 file layout sheet, the different types of respondent frequencies are:

image

Looking at the 2020 EIA-923 data:

Generally what this looks like is: image