csinva / imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
https://csinva.io/imodels
MIT License
1.35k stars 120 forks source link

BoostedRulesClassifier sometimes throws an exception #146

Closed Wan-xiaohui closed 1 year ago

Wan-xiaohui commented 1 year ago

Hi,

When I use the BoostedRulesClassifier, it sometimes throws an exception as follows:

This BoostedRulesClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

I find that the exception results from the implementation of the class RuleSet: ` def _eval_weighted_rule_sum(self, X) -> np.ndarray:

    check_is_fitted(self, ['rules_without_feature_names_', 'n_features_', 'feature_placeholders'])

    X = check_array(X)

    if X.shape[1] != self.n_features_:
        raise ValueError("X.shape[1] = %d should be equal to %d, the number of features at training time."
                         " Please reshape your data."
                         % (X.shape[1], self.n_features_))

    df = pd.DataFrame(X, columns=self.feature_placeholders)
    selected_rules = self.rules_without_feature_names_

    scores = np.zeros(X.shape[0])
    for r in selected_rules: 
        features_r_uses = list(map(lambda x: x[0], r.agg_dict.keys()))
        scores[df[features_r_uses].query(str(r)).index.values] += r.args[0]

    return scores`

Specifically, when the computer runs the check_is_fitted(self, ['rules_without_featurenames', 'nfeatures', 'feature_placeholders']), it finds that self.rules_without_featurenames does not exist, so the computer throws the above exception.

And I further review my code and data set, I find that my training set is easy to train a classifier, so the training error of the estimator is close to zero, it may result in a bug in the fit function of the class BoostedRulesClassifier: ` for _ in range(self.n_estimators):

Fit a classifier with the specific weights

        clf = self.estimator()
        clf.fit(X, y, sample_weight=w)  # uses w as the sampling weight!
        preds = clf.predict(X)
        self.estimator_mean_prediction_.append(np.mean(preds)) # just for printing

        # Indicator function
        miss = preds != y

        # Equivalent with 1/-1 to update weights
        miss2 = np.ones(miss.size)
        miss2[~miss] = -1

        # Error
        err_m = np.dot(w, miss) / sum(w)

        if err_m < 1e-3:
            return self

        # Alpha
        alpha_m = 0.5 * np.log((1 - err_m) / float(err_m))

        # New weights
        w = np.multiply(w, np.exp([float(x) * alpha_m for x in miss2]))

        self.estimators_.append(deepcopy(clf))
        self.estimator_weights_.append(alpha_m)
        self.estimator_errors_.append(err_m)

    rules = []

` Because the error_m is zero, so it directly returns self without executing subsequent statements, in such a case, self.rules_without_featurenames dose not exist.

My current solution to this bug is to modify the following code fragment in the fit function of the class BoostedRulesClassifier: `

Error

        err_m = np.dot(w, miss) / sum(w)

        # modification ###########################
        if err_m < 1e-3:
            # return self
            w = np.ones(miss.size) / len(y)
            self.estimators_.append(deepcopy(clf))
            self.estimator_weights_.append(float("inf"))
            self.estimator_errors_.append(err_m)
            break
         ####################################
        # Alpha
        alpha_m = 0.5 * np.log((1 - err_m) / float(err_m))

` I'm not sure whether it may introduce new defects, but it indeed solves the exception.

csinva commented 1 year ago

Thanks for raising this issue! Will look into it...

csinva commented 1 year ago

I believeI just updated the code (+bumped the version of imodels) to fix this issue: should work once you run pip install --upgrade imodels.

Best, Chandan

Wan-xiaohui commented 1 year ago

I believeI just updated the code (+bumped the version of imodels) to fix this issue: should work once you run pip install --upgrade imodels.

Best, Chandan

Thanks, it solved my problem !