Closed wrkhard closed 3 years ago
Hi,
Thanks! We don't support integration with these features yet, but there are two options to investigate feature importance:
model.feature_importance
attribute to investigate the feature importance by split_gain (i.e. this provides the total accumulated split_gain per feature). More important features typically show higher gain.model.permutation_importance(X, y=None, n_permutations)
to investigate the feature importance based on permutation invariance (see also this explanation). You can do this both supervised (i.e. 'what is the effect on my test error if I permute this feature randomly') or unsupervised (i.e. 'how much does my prediction change when randomly permuting this feature').For an example of all these methods, see here (Torch version) or here (Numba version).
In the future I'd first like to support Shap values as an additional feature importance metric, since this is one of the best techniques for evaluating feature importance.
Hope this helps.
Hi, as an addition to my initial answer, a PGBM model can be wrapped in a sklearn BaseEstimator, which should allow you to be able to use more of sklearn's functionality. The following code demonstrates how you can use sklearn's partial dependence plot function with a PGBM regressor:
from sklearn.base import BaseEstimator
class PGBM_sklearn(BaseEstimator):
def __init__(self, params, objective, metric):
self.params = params
self.objective = objective
self.metric = metric
def fit(self, X, y):
self._estimator_type = "regressor"
self.model = PGBM()
train_set = (X, y)
self.model.train(train_set, params=self.params, objective=self.objective, metric=self.metric)
self.fitted_ = "yes"
return self
def predict(self, X):
return self.model.predict(X)
from sklearn.inspection import plot_partial_dependence
model = PGBM_sklearn(params, objective, metric).fit(X_train, y_train)
plot_partial_dependence(model, X_test, [0])
Perhaps this helps you.
Hi,
As another follow-up, you might be interested to see these examples: here (Torch version) or here (Numba version).
In these examples, we specify a set of monotone constraints for two features and evaluate them using sklearn's partial_dependence
function. The monotone_constraints
feature is new (and still in beta, so doublecheck results), so make sure to upgrade to at least version 1.1
of the package (pip install pgbm --force-reinstall
in the virtual environment where you installed the package).
Hello @elephaint
Thank you very much for the quick response and your assistance with this! The class wrapper works quite nicely!
Hi, good to hear!
I've released a new version (1.2) that fixes a few bugs (specifically for the Torch-GPU version and a bug relating to the calculation of monotone constraints), but maybe more importantly I've also now included the sklearn wrapper. So, you can now simply do, e.g.
from pgbm import PGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
model = PGBMRegressor().fit(X_train, y_train)
yhat_point = model.predict(X_test)
yhat_dist = model.predict_dist(X_test)
For the Numba version, just replace pgbm
with pgbm_nb
. This wrapper uses the standard mse
loss and rmse
evaluation metric, but you can supply own loss functions as a parameter. See also here for the PyTorch version and here for the Numba version for more details about parameters.
The torch-cpu and numba estimators pass all sklearn estimator checks (the gpu-version has an issue with pickling), so they should fit into the sklearn ecosystem nicely.
Closing this issue, feel free to reopen if you feel I missed something.
Hello!
Thank you for this wonderful model!
I was curious though, just how well does PGBM work with things like sklearns partial dependency plots (https://scikit-learn.org/stable/modules/partial_dependence.html), Feature Importance graphs etc?