XGBoost: sum of weights does not match model prediction

TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions

http://eli5.readthedocs.io

MIT License

2.75k stars 331 forks source link

XGBoost: sum of weights does not match model prediction #274

Open noleto opened 6 years ago

noleto commented 6 years ago

Hello everyone,

Local explanations with eli5.explain_prediction yields confusing results when applied to a XGBoost model. Indeed, the sum of contributions (computed from all leaves value) does not match the model prediction. This happens when base_score != 0 (which is the default for XGBRegressor and XGBClassifier).

Here goes the code to reproduce the issue: https://gist.github.com/noleto/987eb668e785a69e87ebf29f56fda55d (Jupyter Nootebook format)

So the question is: should ELI5 add the base_score to the local score (so that it is consistent with the model prediction) or just document better how to interpret the sum of weights?

Whatever the case, the behavior of the method as it is today can be misleading.

My 2 cents,

lopuhin commented 5 years ago

This might be the same issue as #251

coderop2 commented 5 years ago

@lopuhin what should i do here should i change the code so that it adds base_score into the eli5 score or make changes in the documentation

lopuhin commented 5 years ago

@coderop2 I think ideally we should make the score shown by eli5 equal to the model score, and also make it clear where does this come from (so show explicit contribution of base_score somewhere), so that the sum all all feature scores and base score is equal to the total score.

lopuhin commented 5 years ago

or maybe we could add base_score to bias?

noleto commented 5 years ago

many thanks guys for moving forward on this issue. From a tree-based model perspective, base_score can be seen as a kind of bias so it doesn't shock me to add both as the total "bias". However, for someone willing to decompose each part of the explanation it can be confusing as the real bias in a tree model represents the mean of the dataset (we may wonder why you don't have the same value here). So, +1 for showing explicit contribution of base_score (if any) at eli5.explain_prediction .

My 2cts,

lopuhin commented 5 years ago

Thanks @noleto, making base_score explicit makes sense 👍

coderop2 commented 5 years ago

What i propose is that we can include two rows in the HTML template where 1st row shows the base score and the 2nd shows the sum of base_score + eli5 score. So this way we are explicitly mentioning the base score of the estimator.