fabsig / GPBoost

Combining tree-boosting with Gaussian process and mixed effects models
Other
571 stars 46 forks source link

I can't save the GPModel/GPBoostClassifier #115

Closed kaiodealmeida closed 12 months ago

kaiodealmeida commented 1 year ago

Error message

`~/Desktop/Codes/WinProbability/carrot_v1/lib/python3.9/site-packages/gpboost/basic.py in model_to_dict(self, include_response_data) 5854 model_dict["X"] = self._get_covariate_data() 5855 # Additional likelihood parameters (e.g., shape parameter for a gamma likelihood) -> 5856 model_dict["params"]["init_aux_pars"] = self.get_aux_pars(format_pandas=False) 5857 # Note: for simplicity, this is put into 'init_aux_pars'. When loading the model, 'init_aux_pars' are correctly set 5858 model_dict["model_fitted"] = self.model_fitted

~/Desktop/Codes/WinProbability/carrot_v1/lib/python3.9/site-packages/gpboost/basic.py in get_aux_pars(self, format_pandas) 5126 else: 5127 aux_pars = None -> 5128 return aux_pars 5129 5130 def summary(self):

UnboundLocalError: local variable 'aux_pars' referenced before assignment`

fabsig commented 1 year ago

Thanks for using GPBoost!

Can you please provide a reproducible example (for me it works)?

kaiodealmeida commented 1 year ago

I have a Panel Dataset but my target is Binary do you think it's a good idea to train a GPModel and after that applied it in the GPBoosterClassifier ?

...and this doesn't matter what model or data which I'm using, and joblib neither pickle works.

gpboost = gpb.GPBoostClassifier(
    num_leaves = 200,
    max_depth = 10,
    learning_rate = 0.01,
    objective = 'binary'
)

gpboost.fit(
    X = train_data[boost_features],
    y = train_data['TARGET'],
    gp_model = gpbs.gp_model,
    train_gp_model_cov_pars = False,
    verbose = 2
) 
joblib.dump(gpboost,f'{MODEL_PATH}/gpmodel_booster.joblib')

pickle.dump(gpboost,open(f'{MODEL_PATH}/gpmodel_booster.joblib','wb'))

Thanks!

fabsig commented 1 year ago

Can you please provide a reproducible example including data (eg simulated) so that I can reproduce the error?

kaiodealmeida commented 1 year ago

Yes, sure!

`import gpboost as gpb import pandas as pd import numpy as np

Load data

data = pd.read_csv("https://raw.githubusercontent.com/fabsig/Compare_ML_HighCardinality_Categorical_Variables/master/data/wages.csv.gz") data = data.assign(t_sq = data['t']**2)# Add t^2

Partition into training and test data

n = data.shape[0] np.random.seed(n) permute_aux = np.random.permutation(n) train_idx = permute_aux[0:int(0.8 n)] test_idx = permute_aux[int(0.8 n):n] data_train = data.iloc[train_idx] data_test = data.iloc[test_idx]

Define fixed effects predictor variables

pred_vars = [col for col in data.columns if col not in ['ln_wage', 'idcode', 't', 't_sq']]`

fabsig commented 1 year ago

Sorry, but this code contains no calls to any GPBoost functions. Can you please provide a reproducible example including data (eg simulated) so that I can reproduce the error?

kaiodealmeida commented 1 year ago

Yes, follow the code:

` import gpboost as gpb import pandas as pd import numpy as np

Load data

data = pd.read_csv("https://raw.githubusercontent.com/fabsig/Compare_ML_HighCardinality_Categorical_Variables/master/data/wages.csv.gz") data = data.assign(t_sq = data['t']**2)# Add t^2

Partition into training and test data

n = data.shape[0] np.random.seed(n) permute_aux = np.random.permutation(n) train_idx = permute_aux[0:int(0.8 n)] test_idx = permute_aux[int(0.8 n):n] data_train = data.iloc[train_idx] data_test = data.iloc[test_idx]

Define fixed effects predictor variables

pred_vars = [col for col in data.columns if col not in ['ln_wage', 'idcode', 't', 't_sq']]

gpboost = gpb.GPBoostClassifier( num_leaves = 200, max_depth = 10, learning_rate = 0.01, objective = 'binary' )

gp_model = gpb.GPModel(group_data=data_train['idcode'], likelihood='gaussian') data_bst = gpb.Dataset(data=data_train[pred_vars], label=data_train['ln_wage'])

gpboost.fit( X = data_train[pred_vars], y = data_train['ln_wage'], gp_model = gp_model, train_gp_model_cov_pars = False )

ERROR HERE

joblib.dump(gpboost,f'{MODEL_PATH}/gpmodel_booster.joblib')

pickle.dump(gpboost,open(f'{MODEL_PATH}/gpmodel_booster.joblib','wb')) `

fabsig commented 1 year ago

I am getting the following error when running your code: ValueError: Unknown label type: 'continuous'.

Your are trying to give a continuous label variable to a binary classifier.

kaiodealmeida commented 1 year ago

Please, could you try with this?

gpboost = gpb.GPBoostRegressor( num_leaves = 200, max_depth = 10, learning_rate = 0.01, )

fabsig commented 12 months ago

I fixed a bug when saving models (related to aux_pars). Your error should no longer appear (with version 1.2.7 or later).

FWIW: on my machine, no error occurred also with earlier versions of GPBoost when I run your code, it runs (and did run) all fine. In any case, I would not save models using pickle or joblib (not sure, if this works correctly), but rather use GPBoost's internal saving option: see here for an example.

Thanks alot for reporting this issue!