Integrate fuel proportions into FERC Plant ID assignment

This can be done in the pudl.ferc1.transform process, by merging the dataframe created by fuel_by_plant() with the steam_ferc1 dataframe before it goes into the FERCPlantClassifier -- and modifying the FERC Plant Classifier to pay attention to the fuel proportions as well. However, looking at this a bit has brought up some questions for me:

How can the proportions of all the fuels be unified into a single "feature" rather than treating each of them individually, so that the importance of the fuel proportions feature can be scaled independently of the other features.
Do all of the Normalization elements of the Classifier really do anything? They operate on rows, but there's only a single column in the features that they're normalizing, so what is there to compare against for scale?
Might it be simpler and good enough to just create a categorical variable here, of primary fuel consumed, and encode it as a OneHot feature?

Initial fiddling with the setup didn't seem to improve (or even hardly change) the ID generation process, so I'm wondering if I'm doing something stupid/wrong here. The assigned IDs still have the same issue of about 1200-1300 plant records (10% of all the plant records) being left out and assigned orphan IDs, even though they appear to be part of very well defined plant time series when I look by hand. So, this is something to be addressed in conjunction with #221 and #144.

catalyst-cooperative / pudl

Integrate fuel proportions into FERC Plant ID assignment #266