csinva / imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
https://csinva.io/imodels
MIT License
1.34k stars 119 forks source link

Possible bugs in GreedyRuleListClassifier #167

Closed mcschmitz closed 1 year ago

mcschmitz commented 1 year ago

Hi!

I am not 100% if the following issues I encountered are bug so please bare with me if I simply misunderstood the GreedyRuleListClassifier.

X = np.array([0.8, 0.8, 0.3, 0.3, 0.3, 0.3]).reshape(-1, 1) y = np.array([0, 0, 1, 1, 1, 1])

model = GreedyRuleListClassifier(max_depth=1) model.fit(X, y, verbose=True) y_pred = model.predict(X) print(confusion_matrix(y, y_pred))

[[0 2] [4 0]]

If I change the `predict_proba` method to something like this it returns the expected results.

```python
from sklearn.utils.validation import check_array, check_is_fitted

class MyGreedyRuleListClassifier(GreedyRuleListClassifier):
    def predict_proba(self, X):
        check_is_fitted(self)
        X = check_array(X)
        n = X.shape[0]
        probs = np.zeros(n)
        for i in range(n):
            x = X[i]
            for j, rule in enumerate(self.rules_):
                if j == len(self.rules_) - 1:
                    probs[i] = rule['val']
                    continue
                regular_condition = x[rule["index_col"]] >= rule["cutoff"]
                flipped_condition = x[rule["index_col"]] < rule["cutoff"]
                condition = flipped_condition if rule["flip"] else regular_condition
                if condition:
                    probs[i] = rule['val_right']
                    break
        return np.vstack((1 - probs, probs)).transpose()  # probs (n, 2)

model = MyGreedyRuleListClassifier(max_depth=1)
model.fit(X, y, verbose=True)
y_pred = model.predict(X)
print(confusion_matrix(y, y_pred))
[[2 0]
 [0 4]]

Let me know if you can confirm this. I can provide a PR fixing both issues in that case.

csinva commented 1 year ago

Hi @mcschmitz, wow thank you for finding this! This is a pretty major bug and I'm surprised it wasn't found earlier (GreedyRuleListClassifier isn't a very popular model, but a bug this big should have been causing problems).

A PR fixing these would be great! Let me know if you run into any trouble!

mcschmitz commented 1 year ago

Do you have some form of automation when creating the demo notebooks? If not, I can also push the updates for the notebooks.

On another note: I started with python 3.6, as this is mentioned to work on pypi. However TestBRL.test_integration_fitting & TestFIGS.test_categorical were both failing with AttributeError: 'OneHotEncoder' object has no attribute 'get_feature_names_out'. The python badge on the ReadMe mentions 3.7, but the setup.py as well as the pypi package mention 3.6. I updated on 3.7 and all tests pass, so I didn't try to fix it on 3.6. Just wanted to note that.

csinva commented 1 year ago

Thanks for making this!

There isn't currently any automation, but I will update the notebooks manually. You are right, there is an issue with 3.6 right now, I will try and fix it, but we may drop 3.6 support soon...