Closed rphsantos closed 5 years ago
thanks – I’ll take a look at this
From: rphsantos [mailto:notifications@github.com] Sent: Friday, May 17, 2019 9:51 AM To: bethatkinson/rpart Cc: Subscribed Subject: [EXTERNAL] [bethatkinson/rpart] High computational time for categorical values (#5)
Hi there,
I'm having some trouble with a data set that has a lot of categorical variables (rpart takes a long time), and there is a nice analysis of this issue in this stackoverflow question: https://stackoverflow.com/questions/17195021/rpart-computational-time-for-categorical-vs-continuous-regressors
In short, it seems the for in the method rpart:::rpart.matrix is not very efficient, and Mr. Hong Ooi proposed this modification:
system.time(mm <- rpart:::rpart.matrix(m))
user system elapsed
208.25 88.03 296.99
f <- function(frame)
{
if (!inherits(frame, "data.frame") || is.null(attr(frame,
"terms")))
return(as.matrix(frame))
frame[] <- lapply(frame, function(x) {
if (is.character(x))
as.numeric(factor(x))
else if(!is.numeric(x))
as.numeric(x)
else x
})
X <- model.matrix(attr(frame, "terms"), frame)[, -1L, drop = FALSE]
colnames(X) <- sub("^`(.*)`", "\\1", colnames(X))
class(X) <- c("rpart.matrix", class(X))
X
}
system.time(mm2 <- f(m))
user system elapsed
0.65 0.04 0.70
identical(mm, mm2)
[1] TRUE
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bethatkinson/rpart/issues/5?email_source=notifications&email_token=ACWQG56RPMSJD6DCJWJB6KDPV3A3VA5CNFSM4HNV6XR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUNP5NQ, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACWQG57NUJUIFPABBI2UEDDPV3A3VANCNFSM4HNV6XRQ.
Hi there,
I'm having some trouble with a data set that has a lot of categorical variables (rpart takes a long time), and there is a nice analysis of this issue in this stackoverflow question: https://stackoverflow.com/questions/17195021/rpart-computational-time-for-categorical-vs-continuous-regressors
In short, it seems the for in the method rpart:::rpart.matrix is not very efficient, and Mr. Hong Ooi proposed this modification: