gbm-developers / gbm

Gradient boosted models (the old gbm package)
Other
51 stars 27 forks source link

gbm.fit cannot handle ordered factors #75

Closed koenderks closed 9 months ago

koenderks commented 1 year ago

I found that using an ordered factor as a target variable causes the following error message from gbm, originating from the gbm.fit function (: Error in if (class(y) == "Surv") { : the condition has length > 1. Using an unordered factor fixes the problem.

Reproducible example in R:

> data(iris)
> iris$Species <- factor(iris$Species, ordered = TRUE)
> fit <- gbm::gbm(formula = Species ~ ., data = iris)             # Fails with ordered factors
Error in if (class(y) == "Surv") { : the condition has length > 1

> iris$Species <- factor(iris$Species, ordered = FALSE)           # Works with unordered factors
> fit <- gbm::gbm(formula = Species ~ ., data = iris)
Distribution not specified, assuming multinomial ...
Warning message:
Setting `distribution = "multinomial"` is ill-advised as it is currently broken. It exists only for backwards compatibility. Use at your own risk. 

The culprit is this line in gbm.fit:

if (nrow(x) != ifelse(class(y) == "Surv", nrow(y), length(y))) {
  stop("The number of rows in x does not equal the length of y.")
}

and it gives an error because the vector of classes obtained from class(y) is c("ordered", "factor")

I guess you could fix it using inherits(y, "Surv") or something.

Edit: There is already a PR for this I see (https://github.com/gbm-developers/gbm/pull/58)

gregridgeway commented 1 year ago

Should be fixed now.

Just heed the note on the multinomial functionality. We never felt confident that it was bug-free and got some strange results from it. Also note that gbm3 is still actively maintained and what I generally use for all my work now.

https://github.com/gbm-developers/gbm3

Greg

From: Koen Derks @.> Sent: Monday, October 9, 2023 1:57 AM To: gbm-developers/gbm @.> Cc: Subscribed @.***> Subject: [gbm-developers/gbm] gbm.fit cannot handle ordered factors (Issue #75)

I found that using an ordered factor as a target variable causes the following error message from gbm, originating from the gbm.fit function (: Error in if (class(y) == "Surv") { : the condition has length > 1. Using an unordered factor fixes the problem.

Reproducible example in R:

data(iris) iris$Species <- factor(iris$Species, ordered = TRUE) fit <- gbm::gbm(formula = Species ~ ., data = iris) # Fails with ordered factors Error in if (class(y) == "Surv") { : the condition has length > 1

iris$Species <- factor(iris$Species, ordered = FALSE) # Works with unordered factors fit <- gbm::gbm(formula = Species ~ ., data = iris) Distribution not specified, assuming multinomial ... Warning message: Setting distribution = "multinomial" is ill-advised as it is currently broken. It exists only for backwards compatibility. Use at your own risk.

The culprit is this line in gbm.fit:

if (nrow(x) != ifelse(class(y) == "Surv", nrow(y), length(y))) { stop("The number of rows in x does not equal the length of y.") }

and it gives an error because the vector of classes obtained from class(y) is c("ordered", "factor")

I guess you could fix it using "Surv" %in% class(y) or something.

— Reply to this email directly, https://github.com/gbm-developers/gbm/issues/75 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/ACERTQD7XSPK5Z527ISM643X6OG37AVCNFSM6AAAAAA5YIPZQGVHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZTEMZVGQ2DGMI unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

gregridgeway commented 9 months ago

in 2.1.9 replaced class(y)== with inherits(y, ...