catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Finish DataZipper and apply to EPA CEMS + EIA 923 #208

Closed zaneselvans closed 3 years ago

zaneselvans commented 6 years ago

The EIA 923 data is reported by plant, and within plant, by boiler and generator. We have mapped the boilers and generators together into generation units (which are sub-plant in scale).

The EPA CEMS data reports using almost exactly the same plant IDs (see Issue #178), but rather than reporting by generation unit, they report by emissions unit aka by smokestack.

This is an ideal first use case for the datazipper algorithm, since there's little to no uncertainty in the shared ID space (plant_id_eia), and only a small number of generation / emissions units within each one, as well as multiple mutually reported variables (electricity generation, heat content of fuel input). Before applying the datazipper to the FERC / EIA 923 case, we can apply it here, and know that it should work cleanly. This will allow the generation units we've inferred from the EIA 860 & EIA 923 data (such as fuel costs and heat rates) to be meaningfully combined with the detailed operational time series that's available from EPA CEMS.

cmgosnell commented 3 years ago

@zaneselvans i'm closing because I believe this is no longer relevant because of the EPA crosswalk