Closed kaiodealmeida closed 12 months ago
Thanks for using GPBoost!
Can you please provide a reproducible example (for me it works)?
I have a Panel Dataset but my target is Binary do you think it's a good idea to train a GPModel and after that applied it in the GPBoosterClassifier ?
...and this doesn't matter what model or data which I'm using, and joblib neither pickle works.
gpboost = gpb.GPBoostClassifier(
num_leaves = 200,
max_depth = 10,
learning_rate = 0.01,
objective = 'binary'
)
gpboost.fit(
X = train_data[boost_features],
y = train_data['TARGET'],
gp_model = gpbs.gp_model,
train_gp_model_cov_pars = False,
verbose = 2
)
joblib.dump(gpboost,f'{MODEL_PATH}/gpmodel_booster.joblib')
pickle.dump(gpboost,open(f'{MODEL_PATH}/gpmodel_booster.joblib','wb'))
Thanks!
Can you please provide a reproducible example including data (eg simulated) so that I can reproduce the error?
Yes, sure!
`import gpboost as gpb import pandas as pd import numpy as np
data = pd.read_csv("https://raw.githubusercontent.com/fabsig/Compare_ML_HighCardinality_Categorical_Variables/master/data/wages.csv.gz") data = data.assign(t_sq = data['t']**2)# Add t^2
n = data.shape[0] np.random.seed(n) permute_aux = np.random.permutation(n) train_idx = permute_aux[0:int(0.8 n)] test_idx = permute_aux[int(0.8 n):n] data_train = data.iloc[train_idx] data_test = data.iloc[test_idx]
pred_vars = [col for col in data.columns if col not in ['ln_wage', 'idcode', 't', 't_sq']]`
Sorry, but this code contains no calls to any GPBoost functions. Can you please provide a reproducible example including data (eg simulated) so that I can reproduce the error?
Yes, follow the code:
` import gpboost as gpb import pandas as pd import numpy as np
data = pd.read_csv("https://raw.githubusercontent.com/fabsig/Compare_ML_HighCardinality_Categorical_Variables/master/data/wages.csv.gz") data = data.assign(t_sq = data['t']**2)# Add t^2
n = data.shape[0] np.random.seed(n) permute_aux = np.random.permutation(n) train_idx = permute_aux[0:int(0.8 n)] test_idx = permute_aux[int(0.8 n):n] data_train = data.iloc[train_idx] data_test = data.iloc[test_idx]
pred_vars = [col for col in data.columns if col not in ['ln_wage', 'idcode', 't', 't_sq']]
gpboost = gpb.GPBoostClassifier( num_leaves = 200, max_depth = 10, learning_rate = 0.01, objective = 'binary' )
gp_model = gpb.GPModel(group_data=data_train['idcode'], likelihood='gaussian') data_bst = gpb.Dataset(data=data_train[pred_vars], label=data_train['ln_wage'])
gpboost.fit( X = data_train[pred_vars], y = data_train['ln_wage'], gp_model = gp_model, train_gp_model_cov_pars = False )
joblib.dump(gpboost,f'{MODEL_PATH}/gpmodel_booster.joblib')
pickle.dump(gpboost,open(f'{MODEL_PATH}/gpmodel_booster.joblib','wb')) `
I am getting the following error when running your code: ValueError: Unknown label type: 'continuous'
.
Your are trying to give a continuous label variable to a binary classifier.
Please, could you try with this?
gpboost = gpb.GPBoostRegressor( num_leaves = 200, max_depth = 10, learning_rate = 0.01, )
I fixed a bug when saving models (related to aux_pars). Your error should no longer appear (with version 1.2.7 or later).
FWIW: on my machine, no error occurred also with earlier versions of GPBoost when I run your code, it runs (and did run) all fine. In any case, I would not save models using pickle or joblib (not sure, if this works correctly), but rather use GPBoost's internal saving option: see here for an example.
Thanks alot for reporting this issue!
Error message
`~/Desktop/Codes/WinProbability/carrot_v1/lib/python3.9/site-packages/gpboost/basic.py in model_to_dict(self, include_response_data) 5854 model_dict["X"] = self._get_covariate_data() 5855 # Additional likelihood parameters (e.g., shape parameter for a gamma likelihood) -> 5856 model_dict["params"]["init_aux_pars"] = self.get_aux_pars(format_pandas=False) 5857 # Note: for simplicity, this is put into 'init_aux_pars'. When loading the model, 'init_aux_pars' are correctly set 5858 model_dict["model_fitted"] = self.model_fitted
~/Desktop/Codes/WinProbability/carrot_v1/lib/python3.9/site-packages/gpboost/basic.py in get_aux_pars(self, format_pandas) 5126 else: 5127 aux_pars = None -> 5128 return aux_pars 5129 5130 def summary(self):
UnboundLocalError: local variable 'aux_pars' referenced before assignment`