Closed davidefiocco closed 1 year ago
Thanks for raising this issue! Will look into it shortly...
Hi @davidefiocco, just looked into it. I fixed the KeyError issue and just pushed/bumped the imodels version, so if you upgrade with pip install --upgrade
and rerun you should no longer get that error. Sorry about that...we haven't been maintaining this model well over time.
The accuracy does indeed fluctuate quite a lot for this dataset....GRL is a good algorithm when you are trying to identify a clear subgroup that has high probability of being in a single class, but does poorly with finding interactions since it only ever identifies samples from class 1 and the remaining samples after all rules are predicted as class 0.
If you want to look into it farther, you can visualize some of the models and see how they are overfitting (just need to add the line model._print_list()
.
Thanks so much @csinva and of course absolutely no worries and kudos for your great work on imodels
!
Thanks for tips as well!
The performance of the model is not "wildly varying" anymore after @mcschmitz fix of the behavior in https://github.com/csinva/imodels/pull/167, released with 1.3.17
(@csinva FYI!).
Accuracy:
0.9005847953216374
Accuracy:
0.9064327485380117
Accuracy:
0.8947368421052632
Accuracy:
0.9181286549707602
Accuracy:
0.8830409356725146
Accuracy:
0.8947368421052632
Accuracy:
0.8888888888888888
Accuracy:
0.9122807017543859
Accuracy:
0.8947368421052632
Accuracy:
0.8713450292397661
When running a certain number of experiments with different splits of a given dataset, I see that
GreedyRuleListClassifier
's accuracy wildly varies, and sometimes the code (see for loop below) crashes.So, for example running 10 experiments like this, with different random splits of the same set:
Will give as output something along the lines of
Is this intended behavior? While my test dataset is smallish, the variation in accuracy is still surprising for me and so is the throwing of a
KeyError
. I'm usingscikit-learn==1.0.2
andimodels=1.3.6
and can edit the issue here to add more details.Incidentally, the same behaviour was observed in https://datascience.stackexchange.com/a/116283/50519, noticed by @jonnor.
Thanks!