gbm-developers / gbm

Gradient boosted models (the old gbm package)
Other
51 stars 27 forks source link

Predictions between 0 and 1 in cv.fitted, gbm with Bernoulli distribution #61

Closed mserragarcia closed 3 years ago

mserragarcia commented 3 years ago

The 'gbm' package guide states the following for cv.fitted: "If cross-validation was performed, the cross-validation predicted values on the scale of the linear predictor. That is, the fitted values from the i-th CV-fold, for the model having been trained on the data in all other folds."

But if the outcome is binary (and estimated using Bernoulli distribution), it'd be great to have fitted values that are between 0 and 1. Is that a possibility?

Thank you!

gregridgeway commented 3 years ago

For the Bernoulli, the default fitted values are on the log odds scale, log(p/(1-p)). You can covert them to probabilities as 1/(1+exp(-predictedvalue)) or use the predict() function with type="response"

https://www.rdocumentation.org/packages/gbm/versions/2.1.8/topics/predict.gbm

Greg

From: mserragarcia @.> Sent: Wednesday, July 28, 2021 2:53 PM To: gbm-developers/gbm @.> Cc: Subscribed @.***> Subject: [gbm-developers/gbm] Predictions between 0 and 1 in cv.fitted, gbm with Bernoulli distribution (#61)

The 'gbm' package guide states the following for cv.fitted: "If cross-validation was performed, the cross-validation predicted values on the scale of the linear predictor. That is, the fitted values from the i-th CV-fold, for the model having been trained on the data in all other folds."

But if the outcome is binary (and estimated using Bernoulli distribution), it'd be great to have fitted values that are between 0 and 1. Is that a possibility?

Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gbm-developers/gbm/issues/61 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACERTQEL2QNIPCE53UNV6I3T2BGZVANCNFSM5BE55RAQ .

mserragarcia commented 3 years ago

Thank you!

mserragarcia commented 3 years ago

Apologies for this 2nd question: I have tried both methods, and I don't get the same results.

The 2 methods are:

  1. Converting the fitted values as indicated, using 1/(1+exp(-predictedvalue))
  2. Using predict as follows: yhat.ins = predict(gbm.fit, newdata=datatrain, type="response")

Is my use of "predict" incorrect?

I think method 1 gives " the fitted values from the i-th CV-fold", but method 2 would simply predict on the training sample (without using the folds). This could explain different results. How should predict be specified to deliver the fitted values based on each CV-fold?

Thank you!

bgreenwell commented 3 years ago

Correct @mserragarcia, predict() does not return the cross-validated predictions, those are stored in the $cv.fitted component of the returned gbm object, which you will have to transform according to 1) in @gregridgeway’s comment above.