Using make_column_transformer and OneHotEncoder, make a feature matrix for both the EIA and FERC dfs. Luckily scikitlearn now enables what looks like pretty nice integration between scikitlearn and pandas.
I'm sure I am going to have to come back and refine this, but with the recordlinkage package, this was relatively straightforward. All of the categorical features that are compared with exact
Using
make_column_transformer
andOneHotEncoder
, make a feature matrix for both the EIA and FERC dfs. Luckily scikitlearn now enables what looks like pretty nice integration between scikitlearn and pandas.I'm going to lean heavily on this: https://github.com/justmarkham/scikit-learn-videos/blob/master/10_categorical_features.ipynb https://www.youtube.com/watch?v=irHhDMbw3xo