Open jphall663 opened 6 years ago
Just in case this helps someone else ... the score is apparently the sum of the local contributions. Which was not exactly the recorded target variable value nor the prediction value for the few instances I checked. I would like to know more about this ...
@jphall663 yes, the score should be the sum of the local contributions, also it should be equal to model output before the final non-linearity is applied. So for regression this should be equal to the predicted value, for classification softmax or sigmoid activation will be applied to this score to get the final answer. Which kind of model do you have?
This is a regression problem on simulated data where the signal generating function.
You can see score matches neither the known target value nor the prediction in this case.
Thanks for having a look.
Thanks for proving more details, I think this mismatch should not happen with regression, this could be a bug in eli5.
I think I can provide the data and code ... need to get it from a colleague ... give me some time.
rstrain, rsvalid, rstest = rsframe.split_frame([0.4, 0.3])
(split with h2o)
params = {
'base_score': -0.013836877721816359,
'booster': 'gbtree',
'colsample_bytree': 0.9,
'eta': 0.01,
'eval_metric': 'rmse',
'max_depth': 6,
'nthread': 4,
'objective': 'reg:linear',
'reg_alpha': 0.1,
'reg_lambda': 0.01,
'seed': 12345,
'silent': 0,
'subsample': 0.7}
watchlist = [(rstrain_dm, 'train'), (rsvalid_dm, 'eval')]
rs_model = xgb.train(params,
rstrain_dm,
3338,
evals=watchlist,
verbose_eval=True)
These are the parameters used in generating the screenshot provided above and the data is attached. random_with_signal.csv.zip
We think this is related to using regularization in XGBoost. It happens for regression and classification in our tests. You can see them here: https://github.com/h2oai/mli-resources/blob/master/lime_shap_treeint_compare
(This test case should run for you without too much effort. Just clone the repo and install dependencies.)
What is 'score' when I use
explain_prediction_xgboost()
?I see y (score: ####) what is this number?
(I have spent about an hour reading the doc to find this ... and I can't sorry.)
Thanks!!