Closed zaneselvans closed 4 years ago
It turns out that about 1% (~20) of the plant_id_ferc1
values show up in more than one plant_id_pudl
and some of them show up in as many as 4 PUDL Plant IDs. There are 222 affected records, involving 35 different plant_id_pudl
values, and so far as I can tell from inspecting them, all the inconsistencies seem to be due to incorrect or incomplete manual assignment of plant_id_pudl
values. So someone needs to get into the plant & utility matching spreadsheet...
There are a number of things that could be done in there that would improve or expand the FERC Form 1 data. See e.g. #284 and #286. @cmgosnell or @stevenbwinter would either of you be up for editing it? What inputs would you need from me?
The plant_id_pudl
values that are involved:
[245, 11824, 241, 1139, 237, 1137, 621, 1286, 1293, 45, 1247,
1245, 1246, 600, 1188, 383, 1090, 113, 799, 558, 518, 464,
11826, 1294, 1296, 648, 1240, 1068, 557, 1273, 603, 1217,
481, 11827, 41]
Now that we have both PUDL Plant IDs (generated by hand) and FERC Plant IDs, there's the possibility of conflicts between them. Multiple FERC Plant IDs can appear within a single PUDL Plant ID, but the reverse should never happen.
If the FERC Plant IDs aren't always subsets of the PUDL Plant IDs, then the datazipper will be unable to connect together FERC and EIA plants (Issue #212).
As one validation of the FERC Plant ID assignment process, we need to identify any such inconsistencies so that one or the other of the plant ID assignments can be fixed. Note however that because the FERC Pland IDs aren't stable... this might not always have the same result. So as a first cut, we just need to check, and issue a warning if they aren't consistent.