catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Support daily CEMS? #142

Closed karldw closed 6 years ago

karldw commented 6 years ago

Currently the epacems type is set up to download the hourly CEMS data. Would you also like to download the daily? I've heard, but can't confirm, that the hourly and daily sometimes differ (possibly because EPA does additional cleaning on the daily).

URL: ftp://ftp.epa.gov/dmdnload/emissions/daily/quarterly

zaneselvans commented 6 years ago

It would definitely be interesting to see if there are differences between the hourly and daily. Do you know what kind of differences people have found? Or what kind of cleaning might be getting done?

A lot of what we're interested in the CEMS data for is constraining the operational characteristics like ramp rates (which are often redacted as proprietary information in PUC proceedings), as well as intra-day operational patterns so we can see if "baseload" plants are really being used as baseload, or if they're transitioning to being firming resources as they transition out of being least-cost generation options. We also want to understand which fossil plants are most valuable to keep around as flexibility resources when targeting plants for early retirement.

zaneselvans commented 6 years ago

Also, we haven't really dug into integrating the CEMS data yet, since it's not our highest priority right now, and it's a couple of orders of magnitude larger than all the other data combined... which means we're going to have to deal with it dfferently somehow. But on first glance it looks much simpler and cleaner than the EIA or (god forbid) the FERC data.

karldw commented 6 years ago

I don't know much about either the differences or the cleaning. That's an interesting application. I'm interested in the actual emissions info, rather than ramping concerns, so I've started working with the daily data.

Whenever it becomes relevant, the CEMS files are stored as a single CSV inside each zipfile. Pandas' read_csv can read this configuration directly, without any unzipping.

gschivley commented 6 years ago

Column names for the CEMS ftp data change around 2008, which is something of a pain. I have some public code for downloading all the zip files with Python as part of a recent paper. It's a little rough but can serve as a starting point if anyone is interested.