TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.75k stars 332 forks source link

show_prediction does not show probability for xgboost classifier #290

Closed RyanZotti closed 5 years ago

RyanZotti commented 5 years ago

I've posted the same question on Stackoverflow, but figured I might get more traction here.

I'm using the show_prediction function in the eli5 package to understand how my XGBoost classifier arrived at a prediction. For some reason I seem to be getting a regression score instead of a probability for my model.

Below is a fully reproducible example with a public dataset.

from sklearn.datasets import load_breast_cancer
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from eli5 import show_prediction

# Load dataset
data = load_breast_cancer()

# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']

# Split the data
train, test, train_labels, test_labels = train_test_split(
    features,
    labels,
    test_size=0.33,
    random_state=42
)

# Define the model
xgb_model = XGBClassifier(
    n_jobs=16,
    eval_metric='auc'
)

# Train the model
xgb_model.fit(
    train,
    train_labels
)

show_prediction(xgb_model.get_booster(), test[0], show_feature_values=True, feature_names=feature_names)

This produces the following output. Note the score instead of probability.

screen shot 2018-12-14 at 11 26 01 am

The docs show that eli5 supports probabilities though, so clearly I'm doing something wrong.

screen shot 2018-12-14 at 11 25 49 am

I'm using eli5 version 0.8 and xgboost version 0.80 if that helps.

RyanZotti commented 5 years ago

It seems to be related to my use of xgb_model.get_booster(). Looks like the official documentation doesn't use that and passes the model as-is instead, but when I do that I get TypeError: 'str' object is not callable, so that doesn't seem to be an option.

RyanZotti commented 5 years ago

Found the fix.


import eli5
from xgboost import XGBClassifier, XGBRegressor

def _check_booster_args(xgb, is_regression=None):
    # type: (Any, bool) -> Tuple[Booster, bool]
    if isinstance(xgb, eli5.xgboost.Booster): # patch (from "xgb, Booster")
        booster = xgb
    else:
        booster = xgb.get_booster() # patch (from "xgb.booster()" where `booster` is now a string)
        _is_regression = isinstance(xgb, XGBRegressor)
        if is_regression is not None and is_regression != _is_regression:
            raise ValueError(
                'Inconsistent is_regression={} passed. '
                'You don\'t have to pass it when using scikit-learn API'
                .format(is_regression))
        is_regression = _is_regression
    return booster, is_regression

eli5.xgboost._check_booster_args = _check_booster_args

And then replaced the last line of my code with:

show_prediction(xgb_model.get_booster(), test[0], show_feature_values=True, feature_names=feature_names)

Got the solution from this other issue: https://github.com/TeamHG-Memex/eli5/issues/252

lopuhin commented 5 years ago

Thanks for report @RyanZotti , I think we should fix that, so I'll reopen the issue :)

lopuhin commented 5 years ago

This should be fixed with https://github.com/TeamHG-Memex/eli5/pull/268

Davidson919 commented 4 years ago

Hi, I believe I am having a very similar problem to the one described above.

I am running an XGBoost classifiction model, and the expected value when using explain_prediction() is often greater than 1.

I have included a screenshot below image

As you can see, my models prediction and the "y_score" of the explain_prediction package are not aligned.

Could you help?