dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.14k stars 8.71k forks source link

Trouble with SHAP library. Unable to explain xgboost model trained with dual GPU. #4178

Closed kyoungrok0517 closed 5 years ago

kyoungrok0517 commented 5 years ago

For the context, please read https://github.com/slundberg/shap/issues/452#issuecomment-466109399


I'm trying to get feature attribution of my model trained using dual GPU, using shap library. Specifically I'm trying to use TreeExplainer().

Problem The python kernel dies if I put the model as an argument of TreeExplainer(), without any error or C-level dump. image

Here's my code.

def load_model(fname):
    model = xgb.XGBClassifier()
    model.load_model(fname)
    with open(fname.replace('.xgboost', '.json'), encoding='utf-8') as fin:
        params = json.load(fin)
    model.set_params(**params)
    return model

model = load_model('./model/model_2-gram_2019-02-20T15-10-38.xgboost')

params = {
     'tree_method': 'hist',
     'nthread': 4,
     'predictor': 'cpu_predictor', 
     'n_gpus': 1
}
model.set_params(**params)

# compute the SHAP values for every prediciton in the validation dataset
# DIES HERE!
explainer = shap.TreeExplainer(model)

Here's the link to my model and the parameters.

Why am I posting shap-related report here? Because the author of shap library is suspecting there's some bug in the way XGBoost saves the GPU trained model. I've tried two tests the author requested me to try, and said that if both of the test fails, there should be something in the core of XGBoost not in shap. Here's the quote from the shap author:

Hmmm. I wonder if it has something to do with how XGBoost saved the GPU trained model. To narrow down the problem you could try giving approximate=True to the shap_values function or using the `feature_dependence='independent`` option of TreeExplainer with 100 background samples. Both of those options exercise different code paths and might help pin point the issue. If all of those fail then it is probably something core to XGBoost and not specific to shap.


Actually I should be very cautious to attribute the problem to XGBoost, but since this issue is quite important to me, I'm requesting the XGBoost team to see if there's error in the way my model is saved. Please test my model and figure out the problem. Thanks!

kyoungrok0517 commented 5 years ago

UPDATE: is there any difference the way CPU and GPU trained model is saved?. Maybe there's some peculiarities in model file syntax that shap library didn't taken into account. If I can get the idea then I can explain to the shap authors to request for a fix.