GreedyRuleListClassifier has wildly varying performance and sometimes crashes

davidefiocco commented 1 year ago

When running a certain number of experiments with different splits of a given dataset, I see that GreedyRuleListClassifier's accuracy wildly varies, and sometimes the code (see for loop below) crashes.

So, for example running 10 experiments like this, with different random splits of the same set:

import pandas
import sklearn
import sklearn.datasets
from sklearn.model_selection import train_test_split

from imodels import GreedyRuleListClassifier

X, Y = sklearn.datasets.load_breast_cancer(as_frame=True, return_X_y=True)

model = GreedyRuleListClassifier(max_depth=10)

for i in range(10):
  try:
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)
    model.fit(X_train, y_train, feature_names=X_train.columns)
    y_pred = model.predict(X_test)
    from sklearn.metrics import accuracy_score
    score = accuracy_score(y_test.values,y_pred)
    print('Accuracy:\n', score)
  except KeyError as e:
    print("Failed with KeyError")

Will give as output something along the lines of

Accuracy: 0.6081871345029239
Failed with KeyError
Accuracy: 0.4619883040935672
Accuracy: 0.45614035087719296
Accuracy: 0.2222222222222222
Failed with KeyError
Failed with KeyError
Failed with KeyError
Accuracy: 0.18128654970760233
Failed with KeyError

Is this intended behavior? While my test dataset is smallish, the variation in accuracy is still surprising for me and so is the throwing of a KeyError. I'm using scikit-learn==1.0.2 and imodels=1.3.6 and can edit the issue here to add more details.

Incidentally, the same behaviour was observed in https://datascience.stackexchange.com/a/116283/50519, noticed by @jonnor.

Thanks!

csinva commented 1 year ago

Thanks for raising this issue! Will look into it shortly...

csinva commented 1 year ago

Hi @davidefiocco, just looked into it. I fixed the KeyError issue and just pushed/bumped the imodels version, so if you upgrade with pip install --upgrade and rerun you should no longer get that error. Sorry about that...we haven't been maintaining this model well over time.

The accuracy does indeed fluctuate quite a lot for this dataset....GRL is a good algorithm when you are trying to identify a clear subgroup that has high probability of being in a single class, but does poorly with finding interactions since it only ever identifies samples from class 1 and the remaining samples after all rules are predicted as class 0.

If you want to look into it farther, you can visualize some of the models and see how they are overfitting (just need to add the line model._print_list().

davidefiocco commented 1 year ago

Thanks so much @csinva and of course absolutely no worries and kudos for your great work on imodels! Thanks for tips as well!

davidefiocco commented 1 year ago

The performance of the model is not "wildly varying" anymore after @mcschmitz fix of the behavior in https://github.com/csinva/imodels/pull/167, released with 1.3.17 (@csinva FYI!).

Accuracy:
 0.9005847953216374
Accuracy:
 0.9064327485380117
Accuracy:
 0.8947368421052632
Accuracy:
 0.9181286549707602
Accuracy:
 0.8830409356725146
Accuracy:
 0.8947368421052632
Accuracy:
 0.8888888888888888
Accuracy:
 0.9122807017543859
Accuracy:
 0.8947368421052632
Accuracy:
 0.8713450292397661

csinva / imodels

GreedyRuleListClassifier has wildly varying performance and sometimes crashes #145