USEPA / camd-eia-crosswalk

A data crosswalk to integrate U.S. power sector emission and operation data from EPA to EIA
MIT License
34 stars 11 forks source link

Missing CAMD_PLANT_ID values #28

Open aesharpe opened 1 year ago

aesharpe commented 1 year ago

I've been using the CAMD-EIA crosswalk to connect data from the CAMD CEMS dataset and the EIA Form 860.

I noticed that there are some ORISPL_CODE values in the CEMS dataset that are missing from the crosswalk under CAMD_PLANT_ID, the field I believe is the crosswalk equivalent.

Here are the 140 ORISPL_CODE values that are in the CEMS data but not in the crosswalk:

[5, 247, 312, 334, 375, 569, 596, 604, 646, 647,
 658, 668, 699, 700, 734, 964, 1294, 1360, 1372, 1392,
 1458, 1470, 1496, 1555, 1557, 1585, 1589, 1918, 2397, 2473,
 2497, 2502, 2529, 2531, 2629, 2640, 2642, 2858, 2867, 2877,
 2947, 3099, 3109, 3110, 3112, 3114, 3120, 3134, 3139, 3142,
 3143, 3144, 3145, 3146, 3147, 3154, 3155, 3182, 3334, 3419, 
 3436, 3438, 3440, 3442, 3451, 3454, 3455, 3461, 3471, 3480,
 3493, 3503, 3523, 3524, 3526, 3527, 3549, 3610, 4036, 4233,
 4938, 6025, 6598, 7185, 7945, 7996, 8058, 10114, 10252, 10321,
 10430, 10522, 10616, 10618, 10628, 10883, 13213, 14013, 50459, 50468,
 50855, 50954, 54088, 54089, 54138, 54656, 54807, 55082, 55209, 55303,
 55373, 55486, 55683, 55858, 56186, 57185, 59882, 60589, 60698, 60925,
 60926, 60927, 61028, 61035, 61241, 61242, 880009, 880013, 880020, 880021,
 880022, 880026, 880066, 880068, 880070, 880077, 880081, 880091, 880094, 880109]

A good chunk of these seem to correlate directly with EIA_PLANT_ID values from 860.

What is the best way to integrate these into the crosswalk? Should I use the manual mapping form?

j-tafoya commented 1 year ago

Hi @aesharpe, thank you for the contribution! We're in the final stages of preparing the next release that adds several manual matches and many of them may overlap with the ones you added. We will review these as we do our final QA. You should see the new release soon!

aesharpe commented 1 year ago

Hi @aesharpe, thank you for the contribution! We're in the final stages of preparing the next release that adds several manual matches and many of them may overlap with the ones you added. We will review these as we do our final QA. You should see the new release soon!

Woohoo! Looking forward to it, thanks!

j-tafoya commented 1 year ago

Hi @aesharpe!

Thanks again for your contribution! We have reviewed some of the plants you flagged. At this time we don’t have the bandwidth to investigate them all, however the following information might help explain some of the discrepancies you’ve identified in this list.

First, we noticed some of the plants you identified reported to CAMD prior to 2009 but stopped before then (e.g., retired or stopped reporting due to regulatory program coverage changes). 2009 is the cutoff for the data available via the FACT API which is where we are obtaining CAMD data for the crosswalk.

Second, we are currently prioritizing manually matching operating units and some of those you identified are not operating (e.g., retired, proposed) as of 2018, which precludes them from matching in the current crosswalk methodology (Note: we plan to address manual matches for non-operating units in a later update and these will be helpful!).

Third, some of these facilities may have somewhat unique circumstances that led to inconsistent reporting in the past. For example, Plant 880009 doesn’t meet the criteria to be affected by CAMD’s programs (e.g., 25MW or less capacity threshold) but might have reported data to CAMD for other reasons. This may lead to the facility not getting pulled into the crosswalk from the FACT API. As referenced in the README, facilities beginning with ‘88’ and followed by 4 digits are not grid-connected.

Hopefully this information is helpful for understanding why these units aren’t captured by our methodology at this time, but this doesn’t preclude you from using them in your own analysis.