catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
472 stars 108 forks source link

EPA CEMS output routines #207

Closed zaneselvans closed 3 years ago

zaneselvans commented 6 years ago

When the EPA CEMS data is fully integrated (see Issue #171) we need to add the option to output it in a useful tabular format, alongside the other data sources within the output module. This should probably allow partial output of the (huge) dataset, filtering by e.g.

karldw commented 5 years ago

I was digging around in the EPA FTP site and came across a cem_correct program as part of the broader SMOKE modeling utilities: ftp://newftp.epa.gov/Air/emismod/2016/alpha/smoke_2014v7_2_platform_utilities.zip

Do you want to do any similar corrections, either on CEMS ingest or output? I think this file shows the annual total corrections for each unit for SO2 and NOx: ftp://newftp.epa.gov/Air/emismod/2016/beta/reports/EGU/2016b_cems_egu_comparison_11apr2019.xlsx

Quoting the cem_correct readme:

Description

Under Part 75 of Volume 40 in the Code of Federal Regulations, continuous emissions monitoring (CEM) and reporting is required for large EGUs and industrial facilities. The U.S. EPA Clean Air Markets Division (CAMD) collects and distributes hourly CEM data for NOx and SO2 emissions (lbs/hr), heat input (mmBTU), gross load (MW), and steam load (1000 lbs/hr) for thousands of U.S. sources from the year 1995 to the present. Some units are required to report hourly emissions year-round (annual reporters), while other units are only required to report hourly emissions for part of the year (partial year reporters). To satisfy the Part 75 requirement that CEM data are reported for every operating hour that is required to report emissions, a complex process for reporting and filling in missing data has been defined. Many times, missing emissions are substituted with values that are much larger than the actual emissions that were emitted.

The program CEM Correct identifies anomalous values in the CEM database, determines if they are substituted values, and replaces the anomalies with mean data values. Details of the algorithms implemented in CEM Correct are available here:

http://www.ie.unc.edu/cempd/projects/SEMAP/secure/documents/SEMAP_EGU_Modeling_Approach_11_17_2011.pdf [Note: link is broken]

zaneselvans commented 5 years ago

Huh, why would they not integrate these corrections into the dataset they publish? If they are widely accepted, and the operations for applying the corrections are well defined, it does seem like something we might want to do, or at least provide the option for other folks to do after the fact.

On Sat, May 25, 2019 at 3:32 PM Karl Dunkle Werner notifications@github.com wrote:

I was digging around in the EPA FTP site and came across a cem_correct program as part of the broader SMOKE modeling utilities: ftp://newftp.epa.gov/Air/emismod/2016/alpha/smoke_2014v7_2_platform_utilities.zip

Do you want to do any similar corrections, either on CEMS ingest or output? I think this file shows the annual total corrections for each unit for SO2 and NOx: ftp://newftp.epa.gov/Air/emismod/2016/beta/reports/EGU/2016b_cems_egu_comparison_11apr2019.xlsx

Quoting the cem_correct readme:

Description

Under Part 75 of Volume 40 in the Code of Federal Regulations, continuous emissions monitoring (CEM) and reporting is required for large EGUs and industrial facilities. The U.S. EPA Clean Air Markets Division (CAMD) collects and distributes hourly CEM data for NOx and SO2 emissions (lbs/hr), heat input (mmBTU), gross load (MW), and steam load (1000 lbs/hr) for thousands of U.S. sources from the year 1995 to the present. Some units are required to report hourly emissions year-round (annual reporters), while other units are only required to report hourly emissions for part of the year (partial year reporters). To satisfy the Part 75 requirement that CEM data are reported for every operating hour that is required to report emissions, a complex process for reporting and filling in missing data has been defined. Many times, missing emissions are substituted with values that are much larger than the actual emissions that were emitted.

The program CEM Correct identifies anomalous values in the CEM database, determines if they are substituted values, and replaces the anomalies with mean data values. Details of the algorithms implemented in CEM Correct are available here:

http://www.ie.unc.edu/cempd/projects/SEMAP/secure/documents/SEMAP_EGU_Modeling_Approach_11_17_2011.pdf [Note: link is broken]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/catalyst-cooperative/pudl/issues/207?email_source=notifications&email_token=AAERSNYBXY7VWCDSX2K2PVTPXGV5BA5CNFSM4FX7HY72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWHZSDA#issuecomment-495950092, or mute the thread https://github.com/notifications/unsubscribe-auth/AAERSN7BRVP2HMTFQY3DR33PXGV5BANCNFSM4FX7HY7Q .

-- Zane A. Selvans, PhD Chief Data Wrangler Catalyst Cooperative https://catalyst.coop zane.selvans@catalyst.coop Signal/WhatsApp/Telegram/SMS: +1 720 443 1363 Twitter: @ZaneSelvans https://twitter.com/ZaneSelvans PGP https://www.gnupg.org/: 0x64F7B56F3A127B04