bethatkinson / rpart

Recursive Partitioning and Regression Trees
43 stars 23 forks source link

Survival trees outcome #37

Open ambramacis opened 2 years ago

ambramacis commented 2 years ago

Hi, I am fitting survival trees with rpart but I have some doubts concerning the resulting outcome. I understood that the node outcome (yval) should be a measure of relative risk.

However I don't understand how this value is computed and its interpretation. Moreover, using survival times following an Exponential distribution (simulated data), I don't understand how to get the correspondent parameter lambda from the node outcome.

Could you please help me? Thank you.

Here below an example of R code:

library(simsurv)
library(survival)
library(rpart)
library(rpart.plot)

maxtime <- 3 #length of the follow-up

#********Parameter for Exponential times********
lamb1 <- 1.5 #lambda for Exponential time node 1 if X1=1
lamb2 <- 0.8 #lambda for Exponential time node 2 if X1=0 

N <- 2000
seed <- 1511
set.seed(seed)

##########################
# Data generating process
X1 <- factor(rbinom(N,1,0.5) )
x1_1 <- X1==1 #node 1 lamb1
n_x1_1 <- sum(x1_1)
n_x1_0 <- N - n_x1_1
simdata <- as.data.frame(matrix(NA, nrow=N, ncol=2))
simdata[x1_1,]  <- simsurv(dist="exponential",lambda=lamb1,x=as.data.frame(1:n_x1_1),
                           maxt=maxtime,seed=seed)[,2:3] 
simdata[!x1_1,] <- simsurv(dist="exponential",lambda=lamb2,x=as.data.frame(1:n_x1_0),
                           maxt=maxtime,seed=seed)[,2:3] 
simdata <- cbind(simdata, X1)
colnames(simdata) <- c("eventtime","status","X1")

##########################
# Tree estimation
formula <- Surv(simdata$eventtime, simdata$status)~X1
survtree_rpart <- rpart(formula,data=simdata,
                        control=rpart.control(usesurrogate = 2, minsplit = 20, xval=10))
bestcp <- survtree_rpart$cptable[which.min(survtree_rpart$cptable[,"xerror"]),"CP"]
pruned_survtree_rpart <- prune(survtree_rpart,cp=bestcp)
rpart.plot(pruned_survtree_rpart, type=5, tweak=1.5, gap=0.02, branch.lty=2)