fabsig / GPBoost

Combining tree-boosting with Gaussian process and mixed effects models
Other
530 stars 42 forks source link

Further Inquiry #140

Closed eza494 closed 1 month ago

eza494 commented 1 month ago

Hi Dr. Fabio

1) Further question related to these parameters: is_unbalance and sample_pos_weight

Unbalanced Parameters

Does is_unbalance = True and sample_pos_weight work in general. I have tried applying them in the GPModel initialization and fit() function but they are unavailable even though they exist as parameters. My target variable is extemely underepresented and I think this parameter would be a gamechanger if possible:

gp_model = gpb.GPModel(group_data=groups_train_cv, likelihood="binary",is_unbalance = True) gp_model.fit(y=y_train_cv, X=X_train_cv.drop('patientID', axis=1), params={'std_dev': True, "trace": "True", "optimizer_cov": "gradient_descent"} )

pred_resp = gp_model.predict(X_pred=X_test_cv.drop('patientID', axis=1), group_data_pred=groups_test_cv, predict_var=True, predict_response=True) 2) Just to confirm, when running a mixed effects model like above, its using an unstructured covariance structure and utilises as covariance function the default exponential kernel to determine covariance analysing the data? So in all cases the covariance structure is always unstructured when using this model?

Kind Regards

fabsig commented 1 month ago
  1. No, the fit function of a GPModel does not have these parameters (is_unbalance, sample_pos_weight); see here: https://gpboost.readthedocs.io/en/latest/pythonapi/gpboost.GPModel.html#gpboost.GPModel.fit. So, no this will not work.

  2. No, when you use grouped random effects, no covariance function is used. Covariance functions such as an exponential kernel are only used for Gaussian processes (see the argument gp_coords).

eza494 commented 1 month ago

1) So is there another function I can use to leverage the is_unbalance, sample_pos_weight parameters? Do I have to use gpb.Dataset() to construct the dataset exactly? 2) So to further confirm, in the classic sense of mixed models the covariance structure used here is none, unstructured, variance components, compound symmetry or autoregressive? Is there a way to put for example compound symmetry inside the model?