catalyst-cooperative / rmi-ferc1-eia

A collaboration with RMI to integrate FERC Form 1 and EIA CapEx and OpEx reporting
MIT License
3 stars 3 forks source link

Prep feature matrix with categorical features #9

Closed cmgosnell closed 4 years ago

cmgosnell commented 4 years ago

Using make_column_transformer and OneHotEncoder, make a feature matrix for both the EIA and FERC dfs. Luckily scikitlearn now enables what looks like pretty nice integration between scikitlearn and pandas.

I'm going to lean heavily on this: https://github.com/justmarkham/scikit-learn-videos/blob/master/10_categorical_features.ipynb https://www.youtube.com/watch?v=irHhDMbw3xo

cmgosnell commented 4 years ago

I'm sure I am going to have to come back and refine this, but with the recordlinkage package, this was relatively straightforward. All of the categorical features that are compared with exact