catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
476 stars 110 forks source link

Request: change EPA CEMS variable descriptions #360

Closed karldw closed 4 years ago

karldw commented 5 years ago

Below I've listed the current variable descriptions for each of the EPA CEMS variables, as well as my proposed change. I think these changes are helpful because they're sometimes more correct (e.g. plant_id_eia vs facility_id) or are more precise about units.

For some variables, particularly steam_load_1000_lbs, facility_id, and unit_id_epa, I'm a little unsure what to write.

CC: @cmgosnell and @gschivley

cmgosnell commented 5 years ago

hey @karldw ! Lots of these changes look great to me. I'm also unsure about those three fields, but for implementation you can to add these directly into the models for now on the master branch. I'm able to regenerate the metadata for the data packaging based on the current state of the database and I'll probably keep that functionality until the actual switch over date... which is probably still two weeks out. Until then, you can change the comments in models and after that you'll be able to modify the metadata in pudl/src/pudl/package_data/meta/data_package/datapackge.json.

gschivley commented 5 years ago

gross_load_mw Current: Power delivered during time interval measured. Proposed: Average power in megawatts delivered during time interval measured.

delivered might imply to the grid or a final consumer. Maybe Average gross power in megawatts produced during the time interval measured?

heat_content_mmbtu Current: The measure of utilization that is calculated by multiplying the quantity of fuel by the fuel's heat content. Proposed: The energy contained in fuel burned, measured in million BTU.

Worth noting that it is calculated using CO2 measurements rather than measuring fuel inputs?

facility_id Current: The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. Proposed: New EPA plant ID. unit_id_epa Current: Unique EPA identifier for each unit at a facility. Proposed: New EPA unit ID.

Does new mean that they were created in PUDL?

plant_id_eia Current: EIA Plant Identification number. One to five digit numeric. Proposed: The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

Worth noting that in rare cases the plant id is different in EIA data?

karldw commented 5 years ago

@cmgosnell:

Thanks! Just to confirm, should I be using pudl/models/epacems.py on the master branch? As far as I can see, there aren't comments in that model. There are in the ./src/pudl/models/epacems.py file on the python-packaging branch. Should I PR to that branch? Or there's really no rush, I can just wait until the switch-over date.


@gschivley:

I didn't know heat content was calculated from CO2. That's interesting.


facility_id Proposed: New EPA unit ID.

Does new mean that they were created in PUDL?

No, they're from EPA, not generated by PUDL. The "new" part of that was only because they don't go all the way back to the oldest CEMS data. How about "EPA plant ID" and "EPA unit ID"?

gschivley commented 5 years ago

image

From the plain english guide to Part 75.. F-factors are provided in Appendix F of Part 75.

cmgosnell commented 5 years ago

Hey @karldw! I started pulling these in changes into the metadata. I embellished a bit on the facility_id and the unit_id_epa to make them more descriptive, but you know these fields better than I do so feel free to change them.

karldw commented 5 years ago

Hey @cmgosnell, I didn't write a PR! I think the facility_id and unit_id_epa (called FAC_ID and UNIT_ID in the original EPA CSV) are not assigned by EIA. I assume they're assigned by EPA, though I'm not sure what system they refer to. Would it be helpful if I made a PR with this change?

karldw commented 5 years ago

Maybe worth noting: there are a lot of different plant ID systems. For example, here's a listing for the Rathdrum plant in Idaho: https://ofmpub.epa.gov/enviro/fii_query_detail.disp_program_facility?pgm_sys_acrnm_in=EIA-860&pgm_sys_id_in=7456 The corresponding facility_id is 966 with unit IDs 3118 and 3119.

zaneselvans commented 4 years ago

Hey some of these changes seem to have not come through for some reason so I'm going to re-open this just to keep track of the need to update them....