elephaint / pgbm

Probabilistic Gradient Boosting Machines
Apache License 2.0
141 stars 20 forks source link

Does PGBM interface easily with Feature Importance methods? #5

Closed wrkhard closed 3 years ago

wrkhard commented 3 years ago

Hello!

Thank you for this wonderful model!

I was curious though, just how well does PGBM work with things like sklearns partial dependency plots (https://scikit-learn.org/stable/modules/partial_dependence.html), Feature Importance graphs etc?

elephaint commented 3 years ago

Hi,

Thanks! We don't support integration with these features yet, but there are two options to investigate feature importance:

  1. You can use the model.feature_importance attribute to investigate the feature importance by split_gain (i.e. this provides the total accumulated split_gain per feature). More important features typically show higher gain.
  2. You can use model.permutation_importance(X, y=None, n_permutations) to investigate the feature importance based on permutation invariance (see also this explanation). You can do this both supervised (i.e. 'what is the effect on my test error if I permute this feature randomly') or unsupervised (i.e. 'how much does my prediction change when randomly permuting this feature').

For an example of all these methods, see here (Torch version) or here (Numba version).

In the future I'd first like to support Shap values as an additional feature importance metric, since this is one of the best techniques for evaluating feature importance.

Hope this helps.

elephaint commented 3 years ago

Hi, as an addition to my initial answer, a PGBM model can be wrapped in a sklearn BaseEstimator, which should allow you to be able to use more of sklearn's functionality. The following code demonstrates how you can use sklearn's partial dependence plot function with a PGBM regressor:

from sklearn.base import BaseEstimator

class PGBM_sklearn(BaseEstimator):
    def __init__(self, params, objective, metric):
        self.params = params
        self.objective = objective
        self.metric = metric

    def fit(self, X, y):
        self._estimator_type = "regressor"
        self.model = PGBM()
        train_set = (X, y)
        self.model.train(train_set, params=self.params, objective=self.objective, metric=self.metric)
        self.fitted_ = "yes"

        return self

    def predict(self, X):
        return self.model.predict(X)

from sklearn.inspection import plot_partial_dependence
model = PGBM_sklearn(params, objective, metric).fit(X_train, y_train)
plot_partial_dependence(model, X_test, [0])

Perhaps this helps you.

elephaint commented 3 years ago

Hi,

As another follow-up, you might be interested to see these examples: here (Torch version) or here (Numba version).

In these examples, we specify a set of monotone constraints for two features and evaluate them using sklearn's partial_dependence function. The monotone_constraints feature is new (and still in beta, so doublecheck results), so make sure to upgrade to at least version 1.1 of the package (pip install pgbm --force-reinstall in the virtual environment where you installed the package).

wrkhard commented 3 years ago

Hello @elephaint

Thank you very much for the quick response and your assistance with this! The class wrapper works quite nicely!

elephaint commented 3 years ago

Hi, good to hear!

I've released a new version (1.2) that fixes a few bugs (specifically for the Torch-GPU version and a bug relating to the calculation of monotone constraints), but maybe more importantly I've also now included the sklearn wrapper. So, you can now simply do, e.g.

from pgbm import PGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
model = PGBMRegressor().fit(X_train, y_train)  
yhat_point = model.predict(X_test)
yhat_dist = model.predict_dist(X_test)

For the Numba version, just replace pgbm with pgbm_nb. This wrapper uses the standard mse loss and rmse evaluation metric, but you can supply own loss functions as a parameter. See also here for the PyTorch version and here for the Numba version for more details about parameters.

The torch-cpu and numba estimators pass all sklearn estimator checks (the gpu-version has an issue with pickling), so they should fit into the sklearn ecosystem nicely.

elephaint commented 3 years ago

Closing this issue, feel free to reopen if you feel I missed something.