Closed zaneselvans closed 4 years ago
It almost seems like the BA and Utility IDs really are totally differet sets of IDs that are not required to refer to the same entity, but that EIA has gone out of their way to preferentially use the same numerical ID to represent the same entity when they appear in both sets of IDs, which is hella confusing.
The structure of the reporting entities seems to be something like:
balancing_authority_eia861
table[balancing_authority_eia861
table as of 2013, but I think we've associated those BA codes with plants for earlier years in the ID harvesting and infill process. Not sure if the NA values in here are because there were inconsistent BA values reported for individual plants (and so we Nulled it out) or if there really were some plants that never had a BA associated with them (maybe plants that only appear in the pre-2013 data?).In the context of integration with the FERC 714 data, if we have a Planning Area which is associated with a Utility, it seems straightforward to look up the collection of counties associated with that utility in service_territory_eia861
If we have a Planning Area that maps to a Balancing Authority, we can look up the utilities which were associated with that BA in 2010-2012, but not later, unless we want to try and figure out how to use the BA Code and the list of plants that are associated with that code to infer a set of utilities, whose service territories we can then go look up.
Given that both Utilities and BAs and their IDs are being reported in the FERC 714 Planning Area table, how will we know whether to try and look that ID up in the Balancing Authority vs. the Utility tables? Especially given that the ID is often the same in both? If the ID shows up as a utility with enumerated counties defining its service territory, can we be sure that there aren't other utilities who should also be included as part of the planning area, if that same utility shows up in the Balancing Authority table? Which should take precedence? If we go with BA, will we double count? If we go with Utility only, will we miss service territory? See also: #597 #601
Notes from chat with @cmgosnell:
We need the EIA 861 Balancing Authority table to do the GridLab project allocating demand geographically, but it's also part of the Sloan grant.
This table had one form up through 2012, and another form from 2013 to the present (with 2013 being a bit of a mix). The earlier version indicated which utilities were part of what balancing authorities -- with many entries for each balancing authority, pertaining to each balanced authority. The latter version doesn't include managed utilities, and instead has a collection of Balancing Authority codes, and an indication of which states each balancing authority is operating in. Consequently, the early tables have +3000 records per year, while the latter ones only have about 200. Depending on how consistent these attributes are over time, it may be possible to use the latter data to fill in which states the earlier balancing authority records are associated with, and the latter data to fill in which utilities and BA codes the earlier records are associated with. The only consistently reported values across the entire dataset are the Balancing Authority IDs and Names.
So it seems like the natural primary key here is the Balancing Authority ID, and we need to compile the associated states and BA codes from all of the years (though, it's not 100% clear to me how consistent these mappings should be from year to year).
There's also some confusing overlap between the BA IDs and Utility IDs -- some entities act as both, and some ID's show up in both sets associated with the same entity... while other IDs show up in both sets associated with different entities apparently, which is, ugh. Confusing. Just looking at the sets of IDs (from 2010-2018 for the balancing authority table, and 2009-2018 for the EIA 860 Plants table output, we see:
So 99% of Utility IDs are not BA IDs, and about 10% of BA IDs are apparently not Utility IDs, and we need to take a harder look at the IDs which show up both as BA IDs and Util IDs to understand whether they are meant to represent the same entities in general, or not, and why the same entity (like PacifiCorp) has two IDs -- one as a Util and one as a BA, and why some IDs are associated with totally different entities when used as a BA ID vs. being used as a Util ID.
Problem IDs
These IDs show up as both BA IDs and Util IDs, but may be related to different entities, at least based on their names (ID: BA Name / Util Name):