catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Transform EIA 860 environmental equipment summary and association data #2338

Open e-belfer opened 1 year ago

e-belfer commented 1 year ago

Short summary:

From 2014-2021, important summary information on each plant's associated control equipment is compiled in the Emissions Control Equipment table. From 2009-2021, the same information on IDs, cost, equipment type and operational status for SO2 and PM control is available in the table associated with each type of equipment in the EnvrEquipment dataset. More minimal information is also available on NOx and mercury controls. To effectively incorporate information on control equipment investments and operational status throughout time, we should pull the data from both the EnvrEquip tables (6.2 D-G) and the Emissions Control Equipment table, where available. When there is more than one source of information available for a given piece of control equipment, we should compare these values and harmonize them.

The output of this result will be an Emissions Control Equipment table for 2009-2021, with a unique ID for each piece of control equipment, containing the following information:

### Tasks
- [ ] #2561 
- [ ] https://github.com/catalyst-cooperative/pudl/issues/2586

Preliminary Task Detail

(more in individual PRs)

Metadata preparation

Emissions Control Equipment Table (2014-2021)

Association tables (2009-2021): Boiler Cooling, Boiler Particulate Matter, Boiler SO2, Boiler NOx, Boiler Mercury

EnvrEquipment tables (FGD, FGP, Cooling, Stack Flue)

Denormalized boiler-control equipment table

Anticipated problems:

Additional possible tasks:

arengel commented 1 year ago

We'd be very interested in seeing if there is a way to get access to this data or ease the process of, outside of PUDL, accessing this data. In particular we are interested in the following:

  1. The in-service date for emission control equipment
  2. The capital costs of emission control equipment

This data helps understand important details about recent investment in coal plants which helps with prospects for retirement or CCS.

I'm wondering if there is a way of bringing in this data without fully cleaning it. Things like merging together all the environmental equipment association tables and the basic fixing of EIA data weirdness in emissions_control_equipment.

It probably also makes sense to connect with @mariacastillo21 since she's worked with these tables and might have suggestions about how they could be processed.

e-belfer commented 1 year ago

@arengel Together with @aesharpe and @cmgosnell I scoped out what I think is a very feasible path forward to bring this information into PUDL, and updated the issue above. Let me know if anything is unclear, or you have any questions.

Based on my work extracting the data, I think it's quite feasible to process information on in-service date, equipment costs, and associated boiler IDs into PUDL. Here are a few additional questions it would be helpful to have clarity on:

zaneselvans commented 1 year ago

With the new ETL process in Dagster we can get a rough draft of these tables into the DB easily without fully normalizing and linking them with everything else pretty easily, along the lines of what's happening to the EIA-861 tables now.

For the capital costs of emissions control systems, I think they were only reported in these tables for a few years and are no longer available from EIA. Or if they are they've been moved elsewhere.

e-belfer commented 1 year ago

@zaneselvans To my understanding, capital costs are currently reported in the Total Cost (Thousands of Dollars) column of the Emissions Control Equipment table, which has the following definition: "The nominal installed cost for the existing emissions control equipment or the anticipated cost to bring the planned piece of emissions control equipment into commercial operation"

@arengel If this is not the column you're referring to, this would of course be helpful to know!

zaneselvans commented 1 year ago

Hmm. It looks like there's also a bunch of detailed emissions equipment O&M and capital cost data in the EIA-923 Schedule 8. Maybe There was more detailed cost reporting for a couple of years in the EIA-860 and it got incorporated into EIA-923? I'm not sure where I was getting this idea from.

Oh, it looks like it was probably the flue gas particulate emissions control costs, which show up in the EIA-860 through 2012, but not in later years.

e-belfer commented 1 year ago

There is definitely more detailed cost information in Schedule 8. The FGP control costs from 2014-present are now reported in the Emissions Control Equipment tab (which has one row for each piece of control equipment), as is noted above! There's some tab and ID renaming that happens between 2013-2014.

arengel commented 1 year ago

Thanks @e-belfer!

Total Cost (Thousands of Dollars) on the Emissions Control Equipment tab of 6_1_EnviroAssoc_YXXX.xlsx is indeed the one I was referring to, I don't remember how long its been there but it is definitely there in 2021.

In terms of what I think would be useful I think we'd want a couple things:

Before saying anymore, I want to acknowledge that I am not that familiar with issues around boiler reporting and how they get associated with generators. But, it sort of seems like, at least as regards Emissions Control Equipment, sorting out the issues with differently reported past years may not be as important or even required because the table in the most recent year is complete. And if not I'd be fine with only going back to 2013 or whenever the data becomes more or less consistent.

aesharpe commented 1 year ago

@arengel, @e-belfer correct me if I'm wrong here, but I think that the main difference between getting all years vs. just 2014+ is where the columns come from. If we only want 2014+ we can take the table from the Emissions Control Equipment tab as is, and if we want older years, we'll combine columns from various tabs in the the EnviroEquip spreadsheet. They're all reported at the same granularity, so there's no need for any aggregation or complex merging. The only difference is that the earlier years of data may have less mercury and NOx data.

Let me know if there are any preliminary checks you'd like us to run to see if the older years would satisfy your needs. Otherwise we can just go ahead and integrate 2014+

arengel commented 1 year ago

I think the complex merging is with connecting emission control equipment to boilers, that's what I was proposing to avoid figuring out for now.

On going back further, it is lower priority on our side, so I'd propose sticking with the tables that are easy-ish to deal with for now.

Also, it's not obvious to me, at least for the Emissions Control Equipment tab that you need all years, it looks like the most recent year might be complete. Or it might only be complete for currently operating boilers or plants.

zaneselvans commented 1 year ago

I wouldn't be surprised if the emissions control equipment associations aren't entirely constant with time. We've been surprised in the past at just how many of the generator attributes do slowly evolve, for example the changes in the prime mover or technology type. But hopefully those instances are rare, since big reconfigurations of how the units in a plant connect to each other are also rare.

e-belfer commented 1 year ago

I agree the 2021 data should provide a relatively complete source of information, particularly thanks to the addition of the retirement date columns in more recent years of EIA data. But, in addition to the question about whether only operating boilers are covered, I think there's a question about the completeness of data for equipment retirements. We do see retirement dates going back to 1991 in the 2021 table. The survey itself, however, only asks for all equipment that was in operation in the past year, or that is expected to be in operation in the near future. It's possible that the 2021 table is complete, or that there would be some equipment retirements missed because of the framing of the survey itself. I think the workload involved in integrating 2014-2021 data won't be substantially more involved than just pulling the 2021 data, and if looking at changes in equipment over time is important this might help flesh out some more information here. Doing a quick comparison of a few years of data to the 2021 table would be a fast gut check on whether this is actually substantiated, of course. @mariacastillo21 I'd also be curious if you've seen this issue come up in your work with these tables.

Regarding the boiler associations, the proposal of making a mega-association table sounds like a great interim solution for sidestepping the m:m complications.

aesharpe commented 1 year ago

@mariacastillo21 or @arengel do you know what's up with the rows in the Emissions Control Equipment table that have all NA values for XX_control_id and acid_gas_control? Those rows still have a emission_control_equipment_type listed, but nothing else. Do you know what it means for a plant to have emissions equipment but no control ids or acid gas control? It only occurs for emission_control_equipment_type values 'SR', 'OT', 'ACI', 'LNB'. Only the LNB technology is always NA, the others sometimes have control ids or acid gas control values reported.

ANSWER: almost all of these have an operational_status_code other than OP which makes sense that they would not yet or no longer have an ID attached to them. There are only 7 of these rows that do say OP. Three of them are the LNB technology which is never associated with control ids or acid gas control. Two of them are close to the operating year and seem to get updated in later years to have control ids, and the last two (from the same plant: 4941) are only reported in 2016 and 2017 and then seem to go away.

aesharpe commented 1 year ago

Also @mariacastillo21 I'm assuming that control ids reported in the same row are part of the same equipment given the shared emission_control_equipment_cost, installation_year, and emission_control_equipment_type columns. Does that seem fair? Not all of controls ids are the same in a given row (though many of them are), and I'm assuming this difference is not important.

I don't plan on cleaning the control_id columns unless you think it's worth exploring. I do plan on doing some checks to see if the control ids are consistent over time.