fabsig / GPBoost

Combining tree-boosting with Gaussian process and mixed effects models
Other
530 stars 42 forks source link

General Inquiry #139

Closed eza494 closed 1 month ago

eza494 commented 1 month ago

Hi Fabio

I am currently very impressed with this lovely package you have created.

I was wondering when running predictions: gp_modelNestedBT = gpb.GPModel(group_data=group_data_train_BT,likelihood="binary") gp_modelNestedBT.fit(y=y_train_df['Target'], X=X_train2.drop(['patientID','btType'],axis = 1), params={'std_dev': True, "trace":"True","optimizer_cov": "gradient_descent"})

pred_resp_BT = gp_modelNestedBT.predict(X_pred=X_test2.drop(['patientID','btType'],axis=1), group_data_pred=group_data_test_BT, predict_var=True, predict_response=True)

1) The predictions of mean and variance have the same length as the index of X_test2. Can I further assume that the indices are the same? (i.e. X_test2.index == pred_resp_BT.index or are there any mix ups?) 2) When attempting to include nthread to fit(...) or to .GPModel(...) it does not permit it is that normal? 3) I also would like to ask (perhaps this might have been answered elsewhere) the mean and variance per row is the response and the uncertainty of that particular response? Hence I can create a very basic confidence interval for example for that row or would I have to bootstrap to create confidence intervals for the prediction?

Thank you for your time

fabsig commented 1 month ago

Thank you for your interest in GPBoost!

The predictions of mean and variance have the same length as the index of X_test2. Can I further assume that the indices are the same? (i.e. X_test2.index == pred_resp_BT.index or are there any mix ups?)

Yes, every row in X_pred corresponds to the same index in the predictions (and the order is not mixed up).

When attempting to include nthread to fit(...) or to .GPModel(...) it does not permit it is that normal?

There is no nthread argument for these functions. GPBoost does OMP parallelization and just uses all available threads.

I also would like to ask (perhaps this might have been answered elsewhere) the mean and variance per row is the response and the uncertainty of that particular response? Hence I can create a very basic confidence interval for example for that row or would I have to bootstrap to create confidence intervals for the prediction?

Yes, exactly.