Closed zaneselvans closed 5 years ago
This can be done in the pudl.ferc1.transform
process, by merging the dataframe created by fuel_by_plant()
with the steam_ferc1
dataframe before it goes into the FERCPlantClassifier -- and modifying the FERC Plant Classifier to pay attention to the fuel proportions as well. However, looking at this a bit has brought up some questions for me:
Initial fiddling with the setup didn't seem to improve (or even hardly change) the ID generation process, so I'm wondering if I'm doing something stupid/wrong here. The assigned IDs still have the same issue of about 1200-1300 plant records (10% of all the plant records) being left out and assigned orphan IDs, even though they appear to be part of very well defined plant time series when I look by hand. So, this is something to be addressed in conjunction with #221 and #144.
FERC Plant ID assignment (#144) can be greatly improved by including the relative proportions (and potentially absolute amounts) of fuel heat content, and possibly fuel costs, to the set of features that are used to link plant records together. Now that there is an easy way to generate those proportions on a per-plant-year basis, they should be integrated into the ID generation.