elephaint / pgbm

Probabilistic Gradient Boosting Machines
Apache License 2.0
143 stars 20 forks source link

Is PGBM compatible with SHAP? #10

Closed ivan-marroquin closed 2 years ago

ivan-marroquin commented 2 years ago

Is your feature request related to a problem? Please describe. Thanks for such great package!

I noticed in the examples that there is a way to do feature importance analysis based on split gain. I was wondering if you are planning to include examples of feature importance using SHAP (https://github.com/slundberg/shap)

Describe the solution you'd like I believe that SHAP offers an interesting way on how to evaluate feature importance independent of split gain. This technique is based on game theory which tries to find a fair participation of input attributes to asses their importance.

Kind regards,

Ivan

elephaint commented 2 years ago

Hi,

Not yet; I could not yet easily port the framework to work with SHAP; it is possible to do an efficient SHAP calculation for tree models, but I think given our Torch implementation it would be more efficient to rebuild that calculation in our own code rather than using the existing SHAP package. However, that takes a bit of time. This is definitely the first feature that I aim to include, so I guess somewhere in next 1-2 months.

Best,

Olivier

ivan-marroquin commented 2 years ago

Hi Olivier,

Thanks for the quick response and also for considering my enhancement request. I will be looking forward to see your port of SHAP into PGBM. Best of lucks!

Ivan

onacrame commented 2 years ago

Hi Olivier,

Thanks for the quick response and also for considering my enhancement request. I will be looking forward to see your port of SHAP into PGBM. Best of lucks!

Ivan

You can always use the kernel explainer which is model agnostic (but won't be as accurate as the tree algorithm).

elephaint commented 2 years ago

Hi,

I haven't been able to reproduce the tree algorithm yet, so I would resort to @onacrame solution by using the kernel explainer. I've added an example to the PyTorch example folder to illustrate, code snippet below:

#%% Load packages
import shap
from pgbm import PGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
#%% Load data
X, y = fetch_california_housing(return_X_y=True)
#%% Train pgbm
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.01)
# Train on set 
model = PGBMRegressor().fit(X_train, y_train)
#%% Feature importance from shapley values
explainer = shap.Explainer(model.predict, X_test)
shap_values = explainer(X_test)
#%% Visualize
shap.plots.waterfall(shap_values[0])

I am going to close this issue, as I don't see myself reimplementing the tree algorithm from Shap anytime soon, unfortunately.