Open e-belfer opened 1 year ago
We'd be very interested in seeing if there is a way to get access to this data or ease the process of, outside of PUDL, accessing this data. In particular we are interested in the following:
This data helps understand important details about recent investment in coal plants which helps with prospects for retirement or CCS.
I'm wondering if there is a way of bringing in this data without fully cleaning it. Things like merging together all the environmental equipment association tables and the basic fixing of EIA data weirdness in emissions_control_equipment
.
It probably also makes sense to connect with @mariacastillo21 since she's worked with these tables and might have suggestions about how they could be processed.
@arengel Together with @aesharpe and @cmgosnell I scoped out what I think is a very feasible path forward to bring this information into PUDL, and updated the issue above. Let me know if anything is unclear, or you have any questions.
Based on my work extracting the data, I think it's quite feasible to process information on in-service date, equipment costs, and associated boiler IDs into PUDL. Here are a few additional questions it would be helpful to have clarity on:
With the new ETL process in Dagster we can get a rough draft of these tables into the DB easily without fully normalizing and linking them with everything else pretty easily, along the lines of what's happening to the EIA-861 tables now.
For the capital costs of emissions control systems, I think they were only reported in these tables for a few years and are no longer available from EIA. Or if they are they've been moved elsewhere.
@zaneselvans To my understanding, capital costs are currently reported in the Total Cost (Thousands of Dollars)
column of the Emissions Control Equipment table, which has the following definition:
"The nominal installed cost for the existing emissions control equipment or the anticipated cost to bring the planned piece of emissions control equipment into commercial operation"
@arengel If this is not the column you're referring to, this would of course be helpful to know!
Hmm. It looks like there's also a bunch of detailed emissions equipment O&M and capital cost data in the EIA-923 Schedule 8. Maybe There was more detailed cost reporting for a couple of years in the EIA-860 and it got incorporated into EIA-923? I'm not sure where I was getting this idea from.
Oh, it looks like it was probably the flue gas particulate emissions control costs, which show up in the EIA-860 through 2012, but not in later years.
There is definitely more detailed cost information in Schedule 8. The FGP control costs from 2014-present are now reported in the Emissions Control Equipment tab (which has one row for each piece of control equipment), as is noted above! There's some tab and ID renaming that happens between 2013-2014.
Thanks @e-belfer!
Total Cost (Thousands of Dollars)
on the Emissions Control Equipment
tab of 6_1_EnviroAssoc_YXXX.xlsx
is indeed the one I was referring to, I don't remember how long its been there but it is definitely there in 2021.
In terms of what I think would be useful I think we'd want a couple things:
Emissions Control Equipment
tab. The total cost and inservice date columns are the most important but we'd also want retirement date, status, and the others.Emissions Control Equipment
tab.Cooling
, FGP
, FGD
, and Stack Flue
tabs of 6_2_EnviroEquip_YXXX.xlsx
but I'd put it as a lower priority. In those cases, if it isn't too big a pain to do a simple cleaning of those tables, I wouldn't want to stand in your way, but I wouldn't go much beyond basic cleaning (e.g. I wouldn't worry about reconciling costs between them and Emissions Control Equipment
). Users could then use the equipment association table mentioned in (ii) above to link this data to boilders and then generators.Before saying anymore, I want to acknowledge that I am not that familiar with issues around boiler reporting and how they get associated with generators. But, it sort of seems like, at least as regards Emissions Control Equipment
, sorting out the issues with differently reported past years may not be as important or even required because the table in the most recent year is complete. And if not I'd be fine with only going back to 2013 or whenever the data becomes more or less consistent.
@arengel, @e-belfer correct me if I'm wrong here, but I think that the main difference between getting all years vs. just 2014+ is where the columns come from. If we only want 2014+ we can take the table from the Emissions Control Equipment
tab as is, and if we want older years, we'll combine columns from various tabs in the the EnviroEquip
spreadsheet. They're all reported at the same granularity, so there's no need for any aggregation or complex merging. The only difference is that the earlier years of data may have less mercury and NOx data.
Let me know if there are any preliminary checks you'd like us to run to see if the older years would satisfy your needs. Otherwise we can just go ahead and integrate 2014+
I think the complex merging is with connecting emission control equipment to boilers, that's what I was proposing to avoid figuring out for now.
On going back further, it is lower priority on our side, so I'd propose sticking with the tables that are easy-ish to deal with for now.
Also, it's not obvious to me, at least for the Emissions Control Equipment
tab that you need all years, it looks like the most recent year might be complete. Or it might only be complete for currently operating boilers or plants.
I wouldn't be surprised if the emissions control equipment associations aren't entirely constant with time. We've been surprised in the past at just how many of the generator attributes do slowly evolve, for example the changes in the prime mover or technology type. But hopefully those instances are rare, since big reconfigurations of how the units in a plant connect to each other are also rare.
I agree the 2021 data should provide a relatively complete source of information, particularly thanks to the addition of the retirement date columns in more recent years of EIA data. But, in addition to the question about whether only operating boilers are covered, I think there's a question about the completeness of data for equipment retirements. We do see retirement dates going back to 1991 in the 2021 table. The survey itself, however, only asks for all equipment that was in operation in the past year, or that is expected to be in operation in the near future. It's possible that the 2021 table is complete, or that there would be some equipment retirements missed because of the framing of the survey itself. I think the workload involved in integrating 2014-2021 data won't be substantially more involved than just pulling the 2021 data, and if looking at changes in equipment over time is important this might help flesh out some more information here. Doing a quick comparison of a few years of data to the 2021 table would be a fast gut check on whether this is actually substantiated, of course. @mariacastillo21 I'd also be curious if you've seen this issue come up in your work with these tables.
Regarding the boiler associations, the proposal of making a mega-association table sounds like a great interim solution for sidestepping the m:m complications.
@mariacastillo21 or @arengel do you know what's up with the rows in the Emissions Control Equipment
table that have all NA values for XX_control_id
and acid_gas_control
? Those rows still have a emission_control_equipment_type
listed, but nothing else. Do you know what it means for a plant to have emissions equipment but no control ids or acid gas control? It only occurs for emission_control_equipment_type
values 'SR', 'OT', 'ACI', 'LNB'
. Only the LNB
technology is always NA, the others sometimes have control ids or acid gas control values reported.
ANSWER: almost all of these have an operational_status_code
other than OP
which makes sense that they would not yet or no longer have an ID attached to them. There are only 7 of these rows that do say OP
. Three of them are the LNB
technology which is never associated with control ids or acid gas control. Two of them are close to the operating year and seem to get updated in later years to have control ids, and the last two (from the same plant: 4941
) are only reported in 2016 and 2017 and then seem to go away.
Also @mariacastillo21 I'm assuming that control ids reported in the same row are part of the same equipment given the shared emission_control_equipment_cost
, installation_year
, and emission_control_equipment_type
columns. Does that seem fair? Not all of controls ids are the same in a given row (though many of them are), and I'm assuming this difference is not important.
I don't plan on cleaning the control_id columns unless you think it's worth exploring. I do plan on doing some checks to see if the control ids are consistent over time.
Short summary:
From 2014-2021, important summary information on each plant's associated control equipment is compiled in the Emissions Control Equipment table. From 2009-2021, the same information on IDs, cost, equipment type and operational status for SO2 and PM control is available in the table associated with each type of equipment in the EnvrEquipment dataset. More minimal information is also available on NOx and mercury controls. To effectively incorporate information on control equipment investments and operational status throughout time, we should pull the data from both the EnvrEquip tables (6.2 D-G) and the Emissions Control Equipment table, where available. When there is more than one source of information available for a given piece of control equipment, we should compare these values and harmonize them.
The output of this result will be an Emissions Control Equipment table for 2009-2021, with a unique ID for each piece of control equipment, containing the following information:
Preliminary Task Detail
(more in individual PRs)
Metadata preparation
Emissions Control Equipment Table (2014-2021)
Association tables (2009-2021): Boiler Cooling, Boiler Particulate Matter, Boiler SO2, Boiler NOx, Boiler Mercury
EnvrEquipment tables (FGD, FGP, Cooling, Stack Flue)
Denormalized boiler-control equipment table
Anticipated problems:
Additional possible tasks:
boilers_annual_eia860
table, existing columns refer to strategies for emissions control, but are often empty. We could compare these to operational control equipment to generate more accurate records of what emission control strategies are in place at the boiler-level. Perhaps we should collapse the many strategy code columns in the boilers entity table into a few consolidated columns based on observed equipment types.