csinva / imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
https://csinva.io/imodels
MIT License
1.34k stars 119 forks source link

HSTree 'works' without error with unsupported sklearn models, but doesn't do anything #199

Open bking124 opened 6 months ago

bking124 commented 6 months ago

First off, very much appreciate the package--there are a ton of interesting methods contained here! I was particularly interested in the Hierarchical Shrinkage trees, and had hoped they would be applicable to some of the other gradient boosting methods like LightGBM.

I realize now these are not supported. However, if a user tries to wrap a LGBM classifier with HSTree, they can do so. The fit and predict methods will work without error, but actually nothing changes to the underlying estimator. Similar points were made in #137 and #171. A reproducing example:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from lightgbm import LGBMClassifier
from imodels import HSTreeClassifier
import numpy as np

np.random.seed(15)
X, y = datasets.load_breast_cancer(return_X_y=True)  # binary classification

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=10)

#Train LGBM 
myLGBM = LGBMClassifier()
myLGBM.fit(X_train, y_train)

#LGBM 'wrapped' by HSTree
myHST = HSTreeClassifier(LGBMClassifier(), reg_param=1000)
myHST.fit(X_train, y_train)

#Show they predict same probas
np.array_equal(myLGBM.predict_proba(X_test), myHST.predict_proba(X_test)) #true

I think this behavior is a bit misleading, and instead the user should be prompted with a warning/error saying the model is unsupported. From my understanding of the code, a model is unsupported if it contains neither a 'tree' nor an 'estimators' attribute, so a check for these attributes could perhaps be included as part of the initialization code. Please let me know if I've misunderstood anything!