dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.14k stars 8.71k forks source link

XGB-spark with SHAP value #4944

Open chorfung opened 4 years ago

chorfung commented 4 years ago

Is there any plan to add SHAP value to the xgboost-spark?

thx

trivialfis commented 4 years ago

From me, no. Not sure about others. PRs are welcomed. ;-)

trams commented 4 years ago

I am interested in this too but I lack resources to implement it

georgeothon commented 4 months ago

Now, using .get_booster() and transforming the data to pandas you can calculate shap with SparkXGBClassifier or SparkXGBRegressor.

explainer = shap.Explainer(pipeline.stages[1].get_booster())
shap_values = explainer(df.select(features).toPandas())

shap.plots.beeswarm(shap_values)

image

Versions: