Closed mserragarcia closed 3 years ago
For the Bernoulli, the default fitted values are on the log odds scale, log(p/(1-p)). You can covert them to probabilities as 1/(1+exp(-predictedvalue)) or use the predict() function with type="response"
https://www.rdocumentation.org/packages/gbm/versions/2.1.8/topics/predict.gbm
Greg
From: mserragarcia @.> Sent: Wednesday, July 28, 2021 2:53 PM To: gbm-developers/gbm @.> Cc: Subscribed @.***> Subject: [gbm-developers/gbm] Predictions between 0 and 1 in cv.fitted, gbm with Bernoulli distribution (#61)
The 'gbm' package guide states the following for cv.fitted: "If cross-validation was performed, the cross-validation predicted values on the scale of the linear predictor. That is, the fitted values from the i-th CV-fold, for the model having been trained on the data in all other folds."
But if the outcome is binary (and estimated using Bernoulli distribution), it'd be great to have fitted values that are between 0 and 1. Is that a possibility?
Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gbm-developers/gbm/issues/61 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACERTQEL2QNIPCE53UNV6I3T2BGZVANCNFSM5BE55RAQ .
Thank you!
Apologies for this 2nd question: I have tried both methods, and I don't get the same results.
The 2 methods are:
Is my use of "predict" incorrect?
I think method 1 gives " the fitted values from the i-th CV-fold", but method 2 would simply predict on the training sample (without using the folds). This could explain different results. How should predict be specified to deliver the fitted values based on each CV-fold?
Thank you!
Correct @mserragarcia, predict() does not return the cross-validated predictions, those are stored in the $cv.fitted component of the returned gbm object, which you will have to transform according to 1) in @gregridgeway’s comment above.
The 'gbm' package guide states the following for cv.fitted: "If cross-validation was performed, the cross-validation predicted values on the scale of the linear predictor. That is, the fitted values from the i-th CV-fold, for the model having been trained on the data in all other folds."
But if the outcome is binary (and estimated using Bernoulli distribution), it'd be great to have fitted values that are between 0 and 1. Is that a possibility?
Thank you!