Open ganeshmailbox opened 6 years ago
@ganeshmailbox I'm afraid that the only workaround is to simplify the xgboost model: number of trees/iterations and number of leaves. I'm not an XGBoost tuning expert, but I would try making above two parameters lower, maybe also check if changing learning rate helps, and check validation metrics and eli5.explain_prediction performance - maybe you will be able to reach still satisfactory performance with a smaller model. Complexity of eli5.explain_prediction for xgboost should be approximately linear in number of trees and depth of trees.
The proper fix here would be to profile the code and make optimizations, the bottleneck is most likely somewhere here https://github.com/TeamHG-Memex/eli5/blob/master/eli5/xgboost.py#L264-L412 but it's better to profile. I don't think this code was ever profiled and optimized, so there can be low-hanging fruit here. I would love to work on it but probably will have some time only in April. Another radically different option is to use SHAP explanations as they are natively available in newer xgboost so should be faster, but they are not integrated into eli5 (discussed a bit here https://github.com/TeamHG-Memex/eli5/issues/254)
We are having a xgboost model (0.6a) and we are trying to do explain_prediction on eli5 (0.8) and we are seeing significant response time issue (of order of 4 secs for a single row). Our model is of around 200+ variables. We would like you help to see if there are any options to improve the performance/response time for a single row in pandas data frame pdData significantly. Here are some of options (in vain) we tried and questions we have on the same
eli5.explain_prediction(xgbmodel, pdData.values[0], top=(top_n+1), feature_names=feat_names))