catalyst-cooperative / rmi-ferc1-eia

A collaboration with RMI to integrate FERC Form 1 and EIA CapEx and OpEx reporting
MIT License
3 stars 3 forks source link

Remove false granularities in the MUL #24

Closed cmgosnell closed 4 years ago

cmgosnell commented 4 years ago

We have a ton of duplication in the EIA master unit list generation process and I believe this will lead to less perfect matches between EIA and FERC records. If two records are identical except for their ids than it will be harder for the model to choose between records.

Why the false granularity: When a plant has only one generator, both the plant and the generator records will have the same values. When a unit only has one primary fuel, both records will have the same values. etc.

cmgosnell commented 4 years ago

hmm... I may need to reopen this issue or make another related issue. When working on Issue #23, I found looking several of the matches with the same sum of weighted features. It look like they are a different kind of false granularity. I was originally thinking about the parent/child granularities (plants have generators, plants have technologies) but not peer false granularities (in the example below, the unit 1’s generators all have a plant technology of natural_gas_fired_combined_cycle) they are actually the same collection of plant parts.

image.png