Closed davidefiocco closed 1 year ago
Thanks for taking the time to document this issue and propose a fix! Very appreciated :)
I've just looked through, and this fix looks good to me! I'll change it now.
Fixed in cee96eff8d254f712f7d858fe5be8aeecd138911
Hi, I think I have encountered a small issue in the current implementation of
GreedyRulesListClassifier
(I'm on1.3.17
).In a nutshell, fitted classifiers
rules_
may not contain a description of all sets in which the rules list partition the training set, due to the fact that a max depth is reached.The issue is a bit subtle/minor, but here's a description: when running something along the lines of
the rules in output, depending on the realization of the (randomly seeded)
dataset
may be like:What's not OK in those rules is that no info is given to describe the fate of the outstanding 1000 - (246 + 99 + 387) = 268 points not involved in the rule set (e.g. those not counted by the various
'num_pts_right'
).A better behavior is observed instead for other random instances of the dataset e.g. with the rules set
In which the classification rules involve explicitly all 711 + 9 + 280 = 1000 points (with the outstanding ones included in
num_pts
in the last rule).The issue may be hard to notice, but can be seen by wrapping in a for loop the code above, and printing the sum of the points dealt with by the rules, by adding the snippet (for debugging):
When doing so, running the code above repeatedly with different random instances of
dataset
one should see the issue popping up occasionally, getting in stdout something likeOne possible fix (needs review @csinva !) would be the replacement of the current
https://github.com/csinva/imodels/blob/1243240fec3aae33852ba680ba6aea66a4f86ca7/imodels/rule_list/greedy_rule_list.py#L61-L63
with
thus providing a count and stats for points when
max_depth
is reached. This should allow rules to partition explicitly all points in the training set.This needs checking, but I believe that may work, and could possibly be addressed together with https://github.com/csinva/imodels/issues/169. Thanks a lot!