Closed zaneselvans closed 4 years ago
Hey @swinter2011 this is ready for you if you want to map / assign IDs to the plants that have data.
@cmgosnell and I chatted more generally about the criteria for selecting EIA plants & utilities to double-check for FERC connections when doing the mapping this morning on the design call, an we decided that for now we don't want to try and pull in more EIA IDs -- since the PUDL IDs are really only for gluing FERC & EIA together now, we just need to figure out what combination of criteria (regulatory status, plant size, ownership type, etc.) will select a reasonable slice of EIA entities to double-check against FERC each year to make sure we didn't miss any connections. But we're going to put that off until next time around.
For some reason, there are still 865
utility_id_eia
values for which there are records in the EIA 923 data, which do not have autility_id_pudl
assigned to them. For completeness, and just in case they map to existing FERC utilities somehow, it seems like we should get them added to the mapping spreadsheet.Sub-tasks:
ferc1_eia
glue notebook to generate this list of utilities (have data, but no PUDL utility ID) when we are looking for new utilities each year.The records which have
utility_id_eia
values that lackutility_id_pudl
have the following distribution... like we mapped all of the utilities initially when we did 2011-2015, but since then, we've been missing a chunk of utilities every year, and then there are some from 2009 too. Strangely nothing in 2010 though? Really?However, looking at records in the EIA 860 plants/generators/ownership tables, where the
utility_id_pudl
field is null (but by definition there's is a plant or generator present), we get this distribution of dates: