show_prediction does not show probability for xgboost classifier

RyanZotti commented 5 years ago

I've posted the same question on Stackoverflow, but figured I might get more traction here.

I'm using the show_prediction function in the eli5 package to understand how my XGBoost classifier arrived at a prediction. For some reason I seem to be getting a regression score instead of a probability for my model.

Below is a fully reproducible example with a public dataset.

from sklearn.datasets import load_breast_cancer
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from eli5 import show_prediction

# Load dataset
data = load_breast_cancer()

# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']

# Split the data
train, test, train_labels, test_labels = train_test_split(
    features,
    labels,
    test_size=0.33,
    random_state=42
)

# Define the model
xgb_model = XGBClassifier(
    n_jobs=16,
    eval_metric='auc'
)

# Train the model
xgb_model.fit(
    train,
    train_labels
)

show_prediction(xgb_model.get_booster(), test[0], show_feature_values=True, feature_names=feature_names)

This produces the following output. Note the score instead of probability.

The docs show that eli5 supports probabilities though, so clearly I'm doing something wrong.

I'm using eli5 version 0.8 and xgboost version 0.80 if that helps.

RyanZotti commented 5 years ago

It seems to be related to my use of xgb_model.get_booster(). Looks like the official documentation doesn't use that and passes the model as-is instead, but when I do that I get TypeError: 'str' object is not callable, so that doesn't seem to be an option.

RyanZotti commented 5 years ago

Found the fix.


import eli5
from xgboost import XGBClassifier, XGBRegressor

def _check_booster_args(xgb, is_regression=None):
    # type: (Any, bool) -> Tuple[Booster, bool]
    if isinstance(xgb, eli5.xgboost.Booster): # patch (from "xgb, Booster")
        booster = xgb
    else:
        booster = xgb.get_booster() # patch (from "xgb.booster()" where `booster` is now a string)
        _is_regression = isinstance(xgb, XGBRegressor)
        if is_regression is not None and is_regression != _is_regression:
            raise ValueError(
                'Inconsistent is_regression={} passed. '
                'You don\'t have to pass it when using scikit-learn API'
                .format(is_regression))
        is_regression = _is_regression
    return booster, is_regression

eli5.xgboost._check_booster_args = _check_booster_args

And then replaced the last line of my code with:

show_prediction(xgb_model.get_booster(), test[0], show_feature_values=True, feature_names=feature_names)

Got the solution from this other issue: https://github.com/TeamHG-Memex/eli5/issues/252

lopuhin commented 5 years ago

Thanks for report @RyanZotti , I think we should fix that, so I'll reopen the issue :)

lopuhin commented 5 years ago

This should be fixed with https://github.com/TeamHG-Memex/eli5/pull/268

Davidson919 commented 4 years ago

Hi, I believe I am having a very similar problem to the one described above.

I am running an XGBoost classifiction model, and the expected value when using explain_prediction() is often greater than 1.

I have included a screenshot below

As you can see, my models prediction and the "y_score" of the explain_prediction package are not aligned.

Could you help?

TeamHG-Memex / eli5

show_prediction does not show probability for xgboost classifier #290