catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Refactor Harvesting [Sloan] #639

Open zaneselvans opened 4 years ago

zaneselvans commented 4 years ago

Currently, we harvest attributes seen in association with various entities (utilities, plants, generators, balancing authorities...) after the initial table-by-table transform process -- this integrates information that's available and should be consistent across a family of data sources (right now primarily the EIA 860 and 923, but it also needs to work with the EIA 861, and potentially other sources that refer to EIA IDs, like the EPA CEMS).

EIA 861 is out of scope for this epic, but it is still important to consider here. With the addition of the EIA 861 dataset, there are some additional kinds of table normalization / entity harvesting that need to be enabled, and some known issues with the harvesting process that we may want to address in the process.

In Scope

Out of Scope

ezwelty commented 2 years ago

I am confused whether the new functionalities and current issues described here are relative to the current code or to the code proposed in #806. It does seem to me that #806 addresses at least some of these.