catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Fix conflicts between FERC plant IDs and PUDL plant IDs #241

Closed zaneselvans closed 4 years ago

zaneselvans commented 5 years ago

Now that we have both PUDL Plant IDs (generated by hand) and FERC Plant IDs, there's the possibility of conflicts between them. Multiple FERC Plant IDs can appear within a single PUDL Plant ID, but the reverse should never happen.

If the FERC Plant IDs aren't always subsets of the PUDL Plant IDs, then the datazipper will be unable to connect together FERC and EIA plants (Issue #212).

As one validation of the FERC Plant ID assignment process, we need to identify any such inconsistencies so that one or the other of the plant ID assignments can be fixed. Note however that because the FERC Pland IDs aren't stable... this might not always have the same result. So as a first cut, we just need to check, and issue a warning if they aren't consistent.

zaneselvans commented 5 years ago

It turns out that about 1% (~20) of the plant_id_ferc1 values show up in more than one plant_id_pudl and some of them show up in as many as 4 PUDL Plant IDs. There are 222 affected records, involving 35 different plant_id_pudl values, and so far as I can tell from inspecting them, all the inconsistencies seem to be due to incorrect or incomplete manual assignment of plant_id_pudl values. So someone needs to get into the plant & utility matching spreadsheet...

There are a number of things that could be done in there that would improve or expand the FERC Form 1 data. See e.g. #284 and #286. @cmgosnell or @stevenbwinter would either of you be up for editing it? What inputs would you need from me?

The plant_id_pudl values that are involved:

[245, 11824,   241,  1139,   237,  1137,   621,  1286,  1293, 45,  1247,
1245,  1246,   600,  1188,   383,  1090,   113, 799,   558,   518,   464,
11826,  1294,  1296,   648,  1240, 1068,   557,  1273,   603,  1217,
481, 11827, 41]