A python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models.
Hyperparameters tuning and features selection are two common steps in every machine learning pipeline. Most of the time they are computed separately and independently. This may result in suboptimal performances and in a more time expensive process.
shap-hypetune aims to combine hyperparameters tuning and features selection in a single pipeline optimizing the optimal number of features while searching for the optimal parameters configuration. Hyperparameters Tuning or Features Selection can also be carried out as standalone operations.
shap-hypetune main features:
pip install --upgrade shap-hypetune
lightgbm, xgboost are not needed requirements. The module depends only on NumPy, shap, scikit-learn and hyperopt. Python 3.6 or above is supported.
from shaphypetune import BoostSearch, BoostRFE, BoostRFA, BoostBoruta
BoostSearch(
estimator, # LGBModel or XGBModel
param_grid=None, # parameters to be optimized
greater_is_better=False, # minimize or maximize the monitored score
n_iter=None, # number of sampled parameter configurations
sampling_seed=None, # the seed used for parameter sampling
verbose=1, # verbosity mode
n_jobs=None # number of jobs to run in parallel
)
BoostRFE(
estimator, # LGBModel or XGBModel
min_features_to_select=None, # the minimum number of features to be selected
step=1, # number of features to remove at each iteration
param_grid=None, # parameters to be optimized
greater_is_better=False, # minimize or maximize the monitored score
importance_type='feature_importances', # which importance measure to use: default or shap
train_importance=True, # where to compute the shap feature importance
n_iter=None, # number of sampled parameter configurations
sampling_seed=None, # the seed used for parameter sampling
verbose=1, # verbosity mode
n_jobs=None # number of jobs to run in parallel
)
BoostBoruta(
estimator, # LGBModel or XGBModel
perc=100, # threshold used to compare shadow and real features
alpha=0.05, # p-value levels for feature rejection
max_iter=100, # maximum Boruta iterations to perform
early_stopping_boruta_rounds=None, # maximum iterations without confirming a feature
param_grid=None, # parameters to be optimized
greater_is_better=False, # minimize or maximize the monitored score
importance_type='feature_importances', # which importance measure to use: default or shap
train_importance=True, # where to compute the shap feature importance
n_iter=None, # number of sampled parameter configurations
sampling_seed=None, # the seed used for parameter sampling
verbose=1, # verbosity mode
n_jobs=None # number of jobs to run in parallel
)
BoostRFA(
estimator, # LGBModel or XGBModel
min_features_to_select=None, # the minimum number of features to be selected
step=1, # number of features to remove at each iteration
param_grid=None, # parameters to be optimized
greater_is_better=False, # minimize or maximize the monitored score
importance_type='feature_importances', # which importance measure to use: default or shap
train_importance=True, # where to compute the shap feature importance
n_iter=None, # number of sampled parameter configurations
sampling_seed=None, # the seed used for parameter sampling
verbose=1, # verbosity mode
n_jobs=None # number of jobs to run in parallel
)
Full examples in the notebooks folder.