catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
465 stars 107 forks source link

Apply DataZipper to FERC / EIA Plant Mapping #212

Closed zaneselvans closed 3 years ago

zaneselvans commented 5 years ago

Once the FERC plants time series are being reliably generated automatically (#144) and the DataZipper has been implemented in a general way that works for integration of EIA and EPA CEMS data (#208), it can be applied to the more difficult marriage of the FERC plants and EIA generation units.

gschivley commented 5 years ago

How much more needs to be done on this issue? I'm looking at using PUDL for a project over the summer and would like to use the FERC/EIA connections. Let me know if there's something I can help with.

cmgosnell commented 5 years ago

We've actually put the FERC/EIA generation unit connection on the back burner. The main reason we wanted this in the first place is to connect non-fuel variable operational expenses in FERC to EIA. Most FERC plant records are reported by plant, not by generator... but enough are reported by generator that we wanted the generator level connection. We already have the plant level connection with the plant_id_pudl. It was done by hand, but we've used them a fair amount and they seem pretty good.

All that being said, upon further look into the operational expenses reported in FERC, we realized there is not enough data or consistency for us to actual decide which expenses are variable and which are fixed, so we decided to get at the same info through a different method (Issue 204), which obviated the need for generation unit connections between FERC and EIA.

If you need the unit connections instead of plant level connections then let's figure out how to make that happen. @zaneselvans has been the one working on the zipper largely speaking and has yet to implement it for the EIA/CEMS connection but has made a lot of good progress.

gschivley commented 5 years ago

Good to know that plant matching is done and this is about generator level. The regression method in #204 is probably along the lines of what I'm looking for. And the PJM report might even be enough.

I'm still a couple months away from doing any work but am starting to scope out the LOE and where I'll be getting data from. My plan is to fork PUDL, use the EIA (and maybe FERC) portions for generator data, and then add data of my own. Hopefully I can then offer some of the new data back for other ppl to take advantage of.

cmgosnell commented 5 years ago

Oh great! I hope (fingers crossed) that the regressions will be done and incorporated with a couple of months... @alanawlsn and @zaneselvans have been exploring methods for a while now.

zaneselvans commented 5 years ago

The plant_id_pudl breaks things down by physical facility, but doesn't differentiate between different fuels or owners. However, we have additional information from the fuel table, which allows us to identify plant records that pertain to coal vs. gas plants (e.g. more than 90% or whatever threshold you want of the overall heat content came from one or the other fuel), and we have the utility_id_ferc1 that tells you who owns the plant. Adding up all the ownership slices within each fuel type, and treating that as a "generator" yields a believable distribution of fuel costs and heat rates. Really this almost certainly groups multiple coal units together, and multiple gas units together. However, it may be possible to differentiate those units from each other based on the plant_id_ferc1... but we haven't gotten that far yet.

Glad you're finding it all useful, and I hope we get to merge some of your work in! After spending way too much time on grant writing we've finally gotten back to the code in the last week or two.

zaneselvans commented 5 years ago

FERC1_Heat_Rates Here's what the heat rates look like right now, weighted by net_generation_mwh -- pretty clean! This is cumulative from 2004-2017.

zaneselvans commented 5 years ago

FERC1_and_EIA_Heat_Rates_2009-2017 And here's a comparison of "generator" level heat rate distributions (i.e. plants chopped up by major fuel) for coal and gas plants across the years that we have data from both FERC & EIA, normalized so they have the same scales. Pretty darned similar!

zaneselvans commented 5 years ago

FERC1_and_EIA_Fuel_Costs_2009-2017 Interestingly, the correspondence for fuel costs on a $/MWh basis is not nearly as good, at least for natural gas, where FERC appears to report that gas is systematically more expensive than EIA. I wonder if this is because co-ops and public power companies and some IOUs in competitive markets don't report to FERC? Or the redacted (mainly gas) transactions in EIA? What else could it be?

cmgosnell commented 3 years ago

Closing because we've been developing the more granular connection between FERC1 and EIA in this repo. As soon as the FERC-EIA record linkage process gets settled a bit we are planning on moving it over into PUDL.