fabsig / GPBoost

Combining tree-boosting with Gaussian process and mixed effects models
Other
574 stars 46 forks source link

parameter optimization "does not accept boolean indexers" error #26

Closed Keniajin closed 3 years ago

Keniajin commented 3 years ago

I am trying to perform tuning of parameters with the following approach, but I get an # take() does not accept boolean indexers

df = pd.read_csv('model_gpboost/multicenter_model.csv') 
y_train = df[['var1']]
X_train =df.drop(['var1','var2','var3','var4'], axis = 1)
group = df[['var2']]
gp_model = gpb.GPModel(group_data=group, likelihood = "bernoulli_probit")
gp_model.set_optim_params(params={"optimizer_cov": "gradient_descent"})
data_train = gpb.Dataset(X_train, y_train)
params = { 'objective': 'binary', 'verbose': 0, 'num_leaves': 2**10 }
# Small grid and deterministic grid search
param_grid_small = {'learning_rate': [0.1,0.01], 'min_data_in_leaf': [20,100],
                    'max_depth': [5,10], 'max_bin': [255,1000]}

opt_params = gpb.grid_search_tune_parameters(param_grid=param_grid_small,
                                             params=params,
                                             num_try_random=None,
                                             nfold=4,
                                             gp_model=gp_model,
                                             use_gp_model_for_validation=True,
                                             train_set=data_train,
                                             verbose_eval=1,
                                             num_boost_round=1000, 
                                             early_stopping_rounds=10,
                                             seed=1,
                                             metrics='binary_logloss') 

The detailed error

#-> 2908             indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
   2909 
   2910         # take() does not accept boolean indexers

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Int64Index([   0,    1,    2,    3,    4,    5,    6,    8,    9,   10,\n            ...\n            8115, 8116, 8117, 8118, 8119, 8120, 8121, 8122, 8123, 8124],\n           dtype='int64', length=6093)] are in the [columns]

Is there a problem with using the pandas data frame or what am I missing?

Keniajin commented 3 years ago

Additionally, how can I access the feature importances as a list or dataframe after fitting


`# Define and train GPModel
gp_model = gpb.GPModel(group_data=group)
# create dataset for gpb.train function
data_train = gpb.Dataset(X_train, y_train)
#params = { 'objective': 'binary', 'learning_rate': 0.1,
    #'max_depth': 6, 'min_data_in_leaf': 5, 'verbose': 0 }
# Other parameters not contained in the grid of tuning parameters
params = { 'objective': 'binary', 'verbose': 0, 'num_leaves': 2**10 }

# train model
bst = gpb.train(params=params, train_set=data_train, gp_model=gp_model, num_boost_round=32)
gp_model.summary() # estimated covariance parameters
plt=gpb.plot_importance(bst)`
fabsig commented 3 years ago

Thanks a lot for reporting this! I have fixed this. Starting from gpboost version 0.5.1 (on PyPI) pandas DataFrames cause no problems anymore in the grid_search_tune_parameters function.

Feature importances can be obtained as follows: feature_importances = bst.feature_importance(importance_type='gain') See also here.