investigate the murky wins

Hoookay. I'm about to check in some changes (mostly bug fixes I've found through this investigation). I've narrowed in on a distinction ratio - used to determine whether a winning match is murky or not - of about .2 of the iqr of the possible matches. And with a few bug fixes the results are looking relatively reasonable.

cataloging murky wins by record_id_ferc

truly ambiguous plants:

f1_steam_2018_12_134_2_3: the ferc reported capacity is 140 MW, and there is no plant record from EIA that has a similar cpacity.
f1_steam_2018_12_56_2_2: this one looks like the ferc data is just wrong. Capacity is likely much too low.
f1_steam_2018_12_1_0_3: based on ferc plant name rockport total aeg I would assume this is the AEP portion of the plant which would be 6166_2018_plant_owned_343, but the capacty is off between the ferc and the aep% in EIA (1300 vs. 910). Also the net generation is coming up as 0 in the weighted comparison feature, which feels wrong.

solvable w/ a combinatorial record merge

f1_steam_2018_12_145_2_4: 'fort st. vrain 1-4'

solvable ambiguous plants:

f1_steam_2018_12_194_4_2: 'riverside unit 2' got murkily matched to 55641_ctg1_2018_plant_gen_total_20856... the ctg2 is the same size but had more divergent net gen. Adding the name will hopefully solve this.

solvable... w/ better capacity allocation across owners:

f1_steam_2018_12_89_0_2: ...

checked and looks good.. just low diffs

f1_steam_2018_12_22_0_4
1241_2_2018_plant_unit_owned_10005 -many more

checked and murk resulted in wrong winner.. barely

6043_3_2018_plant_unit_total_6452: ferc's "martin 8" records appears to be the unit associated with '3' (all of the 8* generators. The capacity makes this very clear, but the capcity factor diff tipped it over the edge just barely... maybe we need to add some more capacity heavy training data.
- f1_steam_2018_12_122_01... this one is just weird... i think the "right" answer (which i believe is the whole plant was getting undervalued bc the fuel_type_code_pudl for the plant was unlabeled bc there were two codes for the plant.

catalyst-cooperative / rmi-ferc1-eia

investigate the murky wins #31

cataloging murky wins by record_id_ferc