The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
Currently, we harvest attributes seen in association with various entities (utilities, plants, generators, balancing authorities...) after the initial table-by-table transform process -- this integrates information that's available and should be consistent across a family of data sources (right now primarily the EIA 860 and 923, but it also needs to work with the EIA 861, and potentially other sources that refer to EIA IDs, like the EPA CEMS).
EIA 861 is out of scope for this epic, but it is still important to consider here. With the addition of the EIA 861 dataset, there are some additional kinds of table normalization / entity harvesting that need to be enabled, and some known issues with the harvesting process that we may want to address in the process.
In Scope
[ ] #613
[ ] #641
[ ] #509
[ ] #1249
[ ] #1247
[ ] #1266
[ ] #1280
[x] #1281
Out of Scope
614 The EIA 861 data needs to be integrated into the harvesting / normalization process, but that'll come after we get the current data working in the new harvesting process.
640 We need to be able to harvest association tables, but the first time this becomes an issue will be with the EIA 861 and FERC 714 datasets, which will be integrated after this epic is complete
Our existing process for checking whether harvested attributes are self-consistent enough to be included in the database is particularly fragile, and we will need to go through column by column to identify what threshold we want to choose, and whether there are special cases for measuring consistency / resolving inconsistencies (as with #1280) but that additional work will happen in another epic, once the existing data is working in the new system.
I am confused whether the new functionalities and current issues described here are relative to the current code or to the code proposed in #806. It does seem to me that #806 addresses at least some of these.
Currently, we harvest attributes seen in association with various entities (utilities, plants, generators, balancing authorities...) after the initial table-by-table transform process -- this integrates information that's available and should be consistent across a family of data sources (right now primarily the EIA 860 and 923, but it also needs to work with the EIA 861, and potentially other sources that refer to EIA IDs, like the EPA CEMS).
EIA 861 is out of scope for this epic, but it is still important to consider here. With the addition of the EIA 861 dataset, there are some additional kinds of table normalization / entity harvesting that need to be enabled, and some known issues with the harvesting process that we may want to address in the process.
In Scope
Out of Scope
614 The EIA 861 data needs to be integrated into the harvesting / normalization process, but that'll come after we get the current data working in the new harvesting process.
640 We need to be able to harvest association tables, but the first time this becomes an issue will be with the EIA 861 and FERC 714 datasets, which will be integrated after this epic is complete