csinva / imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
https://csinva.io/imodels
MIT License
1.35k stars 120 forks source link

init impl #151

Closed OmerRonen closed 1 year ago

OmerRonen commented 1 year ago

@csinva I added support for categorical variables for FIGS.

The interface is that the user should specify the names of the columns that are categorical (we assume that X is a pd.DataFrame in this case). Then I created a function encode_categories in the imodels.util.data_util file that transforms the data matrix into one-hot encoding and saves the encoder. Then if only some of the categories are available for inference the matrix would still have the same dimension. I also added a basic test for it.

The clalit people asked for this functionality, let me know what you think!

csinva commented 1 year ago

Nice, this looks good to me and I'll merge it!

Generally, I think stuff like one-hot encoding shouldn't be in imodels since it applies equally to all the models and will introduce redundant code.

The big exception is if we're going to properly handle categorical variables (e.g. allow non-binary splits), but this would be model-specific and even sklearn doesn't do this yet.