catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Write transform method for EIA 861 Balancing Authority table #603

Closed zaneselvans closed 4 years ago

zaneselvans commented 4 years ago

We need the EIA 861 Balancing Authority table to do the GridLab project allocating demand geographically, but it's also part of the Sloan grant.

This table had one form up through 2012, and another form from 2013 to the present (with 2013 being a bit of a mix). The earlier version indicated which utilities were part of what balancing authorities -- with many entries for each balancing authority, pertaining to each balanced authority. The latter version doesn't include managed utilities, and instead has a collection of Balancing Authority codes, and an indication of which states each balancing authority is operating in. Consequently, the early tables have +3000 records per year, while the latter ones only have about 200. Depending on how consistent these attributes are over time, it may be possible to use the latter data to fill in which states the earlier balancing authority records are associated with, and the latter data to fill in which utilities and BA codes the earlier records are associated with. The only consistently reported values across the entire dataset are the Balancing Authority IDs and Names.

So it seems like the natural primary key here is the Balancing Authority ID, and we need to compile the associated states and BA codes from all of the years (though, it's not 100% clear to me how consistent these mappings should be from year to year).

There's also some confusing overlap between the BA IDs and Utility IDs -- some entities act as both, and some ID's show up in both sets associated with the same entity... while other IDs show up in both sets associated with different entities apparently, which is, ugh. Confusing. Just looking at the sets of IDs (from 2010-2018 for the balancing authority table, and 2009-2018 for the EIA 860 Plants table output, we see:

So 99% of Utility IDs are not BA IDs, and about 10% of BA IDs are apparently not Utility IDs, and we need to take a harder look at the IDs which show up both as BA IDs and Util IDs to understand whether they are meant to represent the same entities in general, or not, and why the same entity (like PacifiCorp) has two IDs -- one as a Util and one as a BA, and why some IDs are associated with totally different entities when used as a BA ID vs. being used as a Util ID.

Problem IDs

These IDs show up as both BA IDs and Util IDs, but may be related to different entities, at least based on their names (ID: BA Name / Util Name):

zaneselvans commented 4 years ago

It almost seems like the BA and Utility IDs really are totally differet sets of IDs that are not required to refer to the same entity, but that EIA has gone out of their way to preferentially use the same numerical ID to represent the same entity when they appear in both sets of IDs, which is hella confusing.

zaneselvans commented 4 years ago

The structure of the reporting entities seems to be something like:

zaneselvans commented 4 years ago

In the context of integration with the FERC 714 data, if we have a Planning Area which is associated with a Utility, it seems straightforward to look up the collection of counties associated with that utility in service_territory_eia861 If we have a Planning Area that maps to a Balancing Authority, we can look up the utilities which were associated with that BA in 2010-2012, but not later, unless we want to try and figure out how to use the BA Code and the list of plants that are associated with that code to infer a set of utilities, whose service territories we can then go look up.

Given that both Utilities and BAs and their IDs are being reported in the FERC 714 Planning Area table, how will we know whether to try and look that ID up in the Balancing Authority vs. the Utility tables? Especially given that the ID is often the same in both? If the ID shows up as a utility with enumerated counties defining its service territory, can we be sure that there aren't other utilities who should also be included as part of the planning area, if that same utility shows up in the Balancing Authority table? Which should take precedence? If we go with BA, will we double count? If we go with Utility only, will we miss service territory? See also: #597 #601

zaneselvans commented 4 years ago

Notes from chat with @cmgosnell: