Closed ericvoots closed 1 year ago
Hi,
you are looking for greater_is_better
param
greater_is_better : bool, default=False
Effective only when hyperparameters searching.
Whether the quantity to monitor is a score function,
meaning high is good, or a loss function, meaning low is good.
all the best
hmm I keep getting an error using Brier Score Loss (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.brier_score_loss.html).
I was able to get it working with the AUC metric fine.
Here is the error and the function:
ValueError: y_prob contains values less than 0.
def BRS(y_hat, dtrain): y_true = dtrain.get_label() return 'brs', brier_score_loss(y_true, y_hat)
I checked the data and there good mixture of both 1 and 0's and nothing else.
your boosting model is simply predicting negative values.
When I checked it directly from the model object, all the probabilities were above 0. I also ran into issues using the balanced accuracy measure. Only AUC seems to work.
This is a dummy working example which works fine... I hope u can find it helpful.
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import brier_score_loss
from shaphypetune import BoostRFE
from lightgbm import *
X, y = make_classification(n_samples=6000, n_features=20, n_classes=2,
n_informative=4, n_redundant=6, random_state=0)
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, shuffle=False)
def BRIER(y_true, y_hat):
return 'brier', brier_score_loss(y_true, y_hat, pos_label=1), False
param_grid = {
'learning_rate': [0.2, 0.1],
'num_leaves': [25, 35],
'max_depth': [10, 12]
}
model = BoostRFE(
LGBMClassifier(n_estimators=150, random_state=0, metric="custom"),
param_grid=param_grid, min_features_to_select=1, step=1,
greater_is_better=False
)
model.fit(
X_train, y_train,
eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=1,
eval_metric=BRIER
)
All the best
So the BoostRFE can be used fine with the classification models? On most of the examples here it showed BoostRFE with the regression models?
https://github.com/cerlymarco/shap-hypetune/blob/main/notebooks/XGBoost_usage.ipynb
All the estimators available in shap-hypetune can be used for classification and regression with both xgboost or lgbm
Ah got you. Okay I'm still getting errors on the brier score but also got this error on balanced accuracy:
raise ValueError("Classification metrics can't handle a mix of {0} " ValueError: Classification metrics can't handle a mix of binary and continuous target
Both in the original DB and the dataframe for the target, created all values are 0 and 1.
The regular clf_xgb fits fine and can do both Brier & Balanced Accuracy without issue, but the code crashes on the BoostRFE model (also Boruta too) on the '.fit' step. Here is the code:
clf_xgb = XGBClassifier(n_estimators=2000,
random_state=0,
verbosity=3,
n_jobs=-1,
scale_pos_weight=1,
use_label_encoder=False,
objective='binary:logistic',
eval_set=[(cv_x, cv_y)])
clf_xgb.fit(train_x, train_y)
class_pred = clf_xgb.predict(train_x)
balanced_accuracy = balanced_accuracy_score(class_pred, train_y)
brier_score = brier_score_loss(class_pred, train_y)
print(brier_score)
print(balanced_accuracy)
model = BoostRFE(clf_xgb, param_grid=param_dist, min_features_to_select=1, step=1, n_iter=8, sampling_seed=0)
model.fit(train_x, train_y, eval_set=[(cv_x, cv_y)], early_stopping_rounds=6, verbose=100,eval_metric=ACC)
print(model.estimator_, model.best_params_, model.best_score_, model.n_features_)
print(f"feature ranking {model.ranking_}")
model_ranking_list = list(model.ranking_)
print(model_ranking_list)
it seems you are not using eval_metric=ACC
in regular clf_xgb
Pay attention! I think that you are passing to balanced_accuracy_score
probabilities (continuous values) instead of predicted classes/targets.
I was using the balanced accuracy directly with the following and no crashes:
balanced_accuracy = balanced_accuracy_score(class_pred, train_y)
and when printing the score out. Even when I modify clf_xgb to use the custom Accuracy function as so there are no errors:
clf_xgb = XGBClassifier(n_estimators=2000,
random_state=0,
verbosity=3,
n_jobs=-1,
scale_pos_weight=1,
use_label_encoder=False,
objective='binary:logistic',
eval_set=[(cv_x, cv_y)],
eval_metric=ACC)
and I'm able to print the both the balanced accuracy score (0.984741888307878) and brier score (0.02292) to console.
This is a dummy working example which works fine... I hope u can find it helpful.
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import balanced_accuracy_score
from shaphypetune import BoostRFE
from xgboost import *
X, y = make_classification(n_samples=6000, n_features=20, n_classes=2,
n_informative=4, n_redundant=6, random_state=0)
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, shuffle=False)
def ACC(y_pred, dtrain):
y_true = dtrain.get_label()
y_pred = (y_pred > 0.5).astype(int)
err = 1 - balanced_accuracy_score(y_true, y_pred)
return 'bal_acc', err
param_grid = {
'learning_rate': [0.2, 0.1],
'num_leaves': [25, 35],
'max_depth': [10, 12]
}
model = BoostRFE(
XGBClassifier(n_estimators=150, random_state=0, metric="custom"),
param_grid=param_grid, min_features_to_select=1, step=1,
greater_is_better=False
)
model.fit(
X_train, y_train,
eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=1,
eval_metric=ACC
)
sincerely this is the best I can do... all the best. bie
Hi,
If I use a custom metric like the brier score where lower is better, does this package support looking to minimize the eval metric? or is it by default trying to maximize?
Thank You