RaphaelS1 / survivalmodels

Implementations of survival models in R
https://raphaels1.github.io/survivalmodels/
Other
57 stars 13 forks source link

c-index question #42

Closed RexChen1228 closed 1 year ago

RexChen1228 commented 1 year ago

"Hello, Doctor. Here is my code. I start by simulating a set of survival data and then use the code for DeepHit and DeeoSurv. However, I've encountered an issue: every time I repeat 'deepsurvmodel<-deepsurv(data = oooo[-testid , ],frac = FALSE,activation = "relu", num_nodes = c(4L, 8L, 4L, 2L),dropout =0.1,early_stopping = TRUE,epochs = 10, batch_size = 32, optimizer = "adam")

deepsurvall=predict(deepsurvmodel, oooo[testid , ],type = "risk") deepsurvall1=-deepsurvall

survivalmodels:::cindex(deepsurvall1,oooo[testid , ]$time)

reprex(concordance(Surv(time, status) ~ deepsurvall1, data = oooo[testid , ])) concordance(Surv(time, status) ~ deepsurvall1, data = oooo[testid , ])

and

deephitmodel=deephit(data = oooo[-testid , ], frac = FALSE, activation = "relu", num_nodes = c(4L, 8L, 4L, 2L), dropout = 0.1, early_stopping = TRUE, epochs = 10, batch_size = 32, optimizer = "adam") deephitall=predict(deephitmodel, oooo[testid , ],type = "risk") deephitall1=-deephitall

concordance(Surv(time, status) ~ deephitall1, data = oooo[testid , ]) ' daily, the calculated c-index varies. How should I select the c-index? Is it advisable to choose the c-index from the first execution or should I run it multiple times and take an average? Thank you." theta1=NULL x1=rchisq(100,1) x2=rnorm(100,0,1) n1=rnorm(50,-1,1) n2=rnorm(50,2,1) e1=sample(c(n1,n2), 100, replace=T) Y1=10+x1*1.5+x2+e1 rcl1=rchisq(100,17.5)#改成獨立的 Y_S1=NULL d1=NULL for (i in 1:100){ if(Y1[i]>rcl1[i]){ Y_S1[i]=rcl1[i] d1[i]=0 }else{ Y_S1[i]=Y1[i] d1[i]=1 } }

really_y1=NULL
d11=NULL for (i in 1:100){ if(Y1[i]>rcl1[i]){ really_y1[i]=Y1[i] d11[i]=0 }else{ really_y1[i]=Y1[i] d11[i]=1 } }

PRE=function(x){ a1=x t=c() for (i in 1:100) { t[i]=sum(ifelse(d1[i]==0,0,1) ifelse(Y_S1[i]<Y_S1[-i],1,0) ifelse(x1[i]a1+x2[i]<x1[-i]a1+x2[-i],1,0)) } s = 1/(100(99))sum(t) return(-s) } theta1=optimize(PRE,lower=1,upper=2)$minimum

theta1

x=x1*theta1+x2

origin=cbind(Y_S1,d1,x1,x2) origin1=subset(origin,d1==1) originp=subset(origin,d1==0) ooo=rbind(origin1, originp) oooo=data.frame(ooo) colnames(oooo)=c("time","status","x1","x2")

n <- nrow(oooo)

ntest <- trunc(n / 3) testid <- sample(1:n, ntest)

deepsurvmodel<-deepsurv(data = oooo[-testid , ],frac = FALSE,activation = "relu", num_nodes = c(4L, 8L, 4L, 2L),dropout =0.1,early_stopping = TRUE,epochs = 10, batch_size = 32, optimizer = "adam")

deepsurvall=predict(deepsurvmodel, oooo[testid , ],type = "risk") deepsurvall1=-deepsurvall

survivalmodels:::cindex(deepsurvall1,oooo[testid , ]$time)

reprex(concordance(Surv(time, status) ~ deepsurvall1, data = oooo[testid , ])) concordance(Surv(time, status) ~ deepsurvall1, data = oooo[testid , ])

deephitmodel=deephit(data = oooo[-testid , ], frac = FALSE, activation = "relu", num_nodes = c(4L, 8L, 4L, 2L), dropout = 0.1, early_stopping = TRUE, epochs = 10, batch_size = 32, optimizer = "adam") deephitall=predict(deephitmodel, oooo[testid , ],type = "risk") deephitall1=-deephitall

concordance(Surv(time, status) ~ deephitall1, data = oooo[testid , ])`

RaphaelS1 commented 1 year ago

Hi I'm sorry but this is unreadable code. Please look up how to use the reprex function, all of your code should be inside that function so it can be formatted in a readable way

RexChen1228 commented 1 year ago

Is it like this? install.packages("survivalmodels")

> Installing package into 'C:/Users/Rex/AppData/Local/R/win-library/4.3'

> (as 'lib' is unspecified)

> package 'survivalmodels' successfully unpacked and MD5 sums checked

> Warning: cannot remove prior installation of package 'survivalmodels'

> Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying

> C:\Users\Rex\AppData\Local\R\win-library\4.3\00LOCK\survivalmodels\libs\x64\survivalmodels.dll

> to

> C:\Users\Rex\AppData\Local\R\win-library\4.3\survivalmodels\libs\x64\survivalmodels.dll:

> Permission denied

> Warning: restored 'survivalmodels'

>

> The downloaded binary packages are in

> C:\Users\Rex\AppData\Local\Temp\RtmpGstObx\downloaded_packages

library(survivalmodels) install.packages("reticulate")

> Installing package into 'C:/Users/Rex/AppData/Local/R/win-library/4.3'

> (as 'lib' is unspecified)

> package 'reticulate' successfully unpacked and MD5 sums checked

> Warning: cannot remove prior installation of package 'reticulate'

> Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying

> C:\Users\Rex\AppData\Local\R\win-library\4.3\00LOCK\reticulate\libs\x64\reticulate.dll

> to

> C:\Users\Rex\AppData\Local\R\win-library\4.3\reticulate\libs\x64\reticulate.dll:

> Permission denied

> Warning: restored 'reticulate'

>

> The downloaded binary packages are in

> C:\Users\Rex\AppData\Local\Temp\RtmpGstObx\downloaded_packages

library(reticulate) install.packages("Hmisc")

> Installing package into 'C:/Users/Rex/AppData/Local/R/win-library/4.3'

> (as 'lib' is unspecified)

> package 'Hmisc' successfully unpacked and MD5 sums checked

>

> The downloaded binary packages are in

> C:\Users\Rex\AppData\Local\Temp\RtmpGstObx\downloaded_packages

library(Hmisc)

>

> Attaching package: 'Hmisc'

> The following objects are masked from 'package:base':

>

> format.pval, units

install.packages("survival")

> Installing package into 'C:/Users/Rex/AppData/Local/R/win-library/4.3'

> (as 'lib' is unspecified)

> package 'survival' successfully unpacked and MD5 sums checked

> Warning: cannot remove prior installation of package 'survival'

> Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying

> C:\Users\Rex\AppData\Local\R\win-library\4.3\00LOCK\survival\libs\x64\survival.dll

> to C:\Users\Rex\AppData\Local\R\win-library\4.3\survival\libs\x64\survival.dll:

> Permission denied

> Warning: restored 'survival'

>

> The downloaded binary packages are in

> C:\Users\Rex\AppData\Local\Temp\RtmpGstObx\downloaded_packages

library(survival) use_condaenv("C:/Users/Rex/anaconda3/envs/py36") theta1=NULL x1=rchisq(100,1) x2=rnorm(100,0,1) n1=rnorm(50,-1,1) n2=rnorm(50,2,1) e1=sample(c(n1,n2), 100, replace=T) Y1=10+x1*1.5+x2+e1 rcl1=rchisq(100,17.5) Y_S1=NULL d1=NULL for (i in 1:100){ if(Y1[i]>rcl1[i]){ Y_S1[i]=rcl1[i] d1[i]=0 }else{ Y_S1[i]=Y1[i] d1[i]=1 } }

really_y1=NULL
d11=NULL for (i in 1:100){ if(Y1[i]>rcl1[i]){ really_y1[i]=Y1[i] d11[i]=0 }else{ really_y1[i]=Y1[i] d11[i]=1 } }

PRE=function(x){ a1=x t=c() for (i in 1:100) { t[i]=sum(ifelse(d1[i]==0,0,1) ifelse(Y_S1[i]<Y_S1[-i],1,0) ifelse(x1[i]a1+x2[i]<x1[-i]a1+x2[-i],1,0)) } s = 1/(100(99))sum(t) return(-s) } theta1=optimize(PRE,lower=1,upper=2)$minimum

theta1

> [1] 1.857841

x=x1*theta1+x2

origin=cbind(Y_S1,d1,x1,x2) origin1=subset(origin,d1==1) originp=subset(origin,d1==0) ooo=rbind(origin1, originp) oooo=data.frame(ooo) colnames(oooo)=c("time","status","x1","x2")

n <- nrow(oooo)

ntest <- trunc(n / 3) testid <- sample(1:n, ntest)

deepsurvmodel<-deepsurv(data = oooo[-testid , ],frac = FALSE,activation = "relu", num_nodes = c(4L, 8L, 4L, 2L),dropout =0.1,early_stopping = TRUE,epochs = 10, batch_size = 32, optimizer = "adam")

deepsurvall=predict(deepsurvmodel, oooo[testid , ],type = "risk") deepsurvall1=-deepsurvall

survivalmodels:::cindex(deepsurvall1,oooo[testid , ]$time)

> [1] 0.4479167

concordance(Surv(time, status) ~ deepsurvall1, data = oooo[testid , ])

> Call:

> concordance.formula(object = Surv(time, status) ~ deepsurvall1,

> data = oooo[testid, ])

>

> n= 33

> Concordance= 0.5255 se= 0.05963

> concordant discordant tied.x tied.y tied.xy

> 225 203 3 0 0

deepsurvall=predict(deepsurvmodel, oooo[testid , ],type = "risk") deepsurvall1=-deepsurvall

survivalmodels:::cindex(deepsurvall1,oooo[testid , ]$time)

> [1] 0.4479167

concordance(Surv(time, status) ~ deepsurvall1, data = oooo[testid , ])

> Call:

> concordance.formula(object = Surv(time, status) ~ deepsurvall1,

> data = oooo[testid, ])

>

> n= 33

> Concordance= 0.5255 se= 0.05963

> concordant discordant tied.x tied.y tied.xy

> 225 203 3 0 0

deephitmodel=deephit(data = oooo[-testid , ], frac = FALSE, activation = "relu", num_nodes = c(4L, 8L, 4L, 2L), dropout = 0.1, early_stopping = TRUE, epochs = 10, batch_size = 32, optimizer = "adam") deephitall=predict(deephitmodel, oooo[testid , ],type = "risk") deephitall1=-deephitall

concordance(Surv(time, status) ~ deephitall1, data = oooo[testid , ])

> Call:

> concordance.formula(object = Surv(time, status) ~ deephitall1,

> data = oooo[testid, ])

>

> n= 33

> Concordance= 0.2622 se= 0.05054

> concordant discordant tied.x tied.y tied.xy

> 113 318 0 0 0

deephitmodel=deephit(data = oooo[-testid , ], frac = FALSE, activation = "relu", num_nodes = c(4L, 8L, 4L, 2L), dropout = 0.1, early_stopping = TRUE, epochs = 10, batch_size = 32, optimizer = "adam") deephitall=predict(deephitmodel, oooo[testid , ],type = "risk") deephitall1=-deephitall

concordance(Surv(time, status) ~ deephitall1, data = oooo[testid , ])

> Call:

> concordance.formula(object = Surv(time, status) ~ deephitall1,

> data = oooo[testid, ])

>

> n= 33

> Concordance= 0.5 se= 0

> concordant discordant tied.x tied.y tied.xy

> 0 0 431 0 0

Created on 2023-11-10 with reprex v2.0.2

And my issue is, when I do the same thing, the c-index is different each time. Should I rely on the c-index from the first attempt, or should I do it multiple times and take an average?

RaphaelS1 commented 1 year ago

I'm sorry I want to help but firstly I still can't read the code,I don't think you're pasting it properly from repex, secondly I still don't understand the question. If you are doing something like:

train model -> make predictions -> Cindex

And repeating that twice, then the results will be different if you don't set a seed first because of randomness built into neural networks. If however you're only repeating the last step (Cindex) and the results are different then there may be a bug

RexChen1228 commented 1 year ago

Hello, doctor. I'm sorry you can't read code. I think it's my fault for not using rprex properly because I've never used it before. I'm sorry for not expressing myself clearly. My question is, due to the randomness of the neural network, the c-index varies each time. What's my criterion for comparison? Is it enough to do it once, or should I do it 10 times and average the results? Thank you.從我的iPhone傳送Raphael Sonabend @.***> 於 2023年11月11日 上午6:51 寫道: I'm sorry I want to help but firstly I still can't read the code,I don't think you're pasting it properly from repex, secondly I still don't understand the question. If you are doing something like: train model -> make predictions -> Cindex And repeating that twice, then the results will be different if you don't set a seed first because of randomness built into neural networks. If however you're only repeating the last step (Cindex) and the results are different then there may be a bug

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

RexChen1228 commented 1 year ago

I think I successfully used rpreex!

install.packages("survivalmodels")
#> Installing package into 'C:/Users/Rex/AppData/Local/R/win-library/4.3'
#> (as 'lib' is unspecified)
#> package 'survivalmodels' successfully unpacked and MD5 sums checked
#> Warning: cannot remove prior installation of package 'survivalmodels'
#> Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
#> C:\Users\Rex\AppData\Local\R\win-library\4.3\00LOCK\survivalmodels\libs\x64\survivalmodels.dll
#> to
#> C:\Users\Rex\AppData\Local\R\win-library\4.3\survivalmodels\libs\x64\survivalmodels.dll:
#> Permission denied
#> Warning: restored 'survivalmodels'
#> 
#> The downloaded binary packages are in
#>  C:\Users\Rex\AppData\Local\Temp\Rtmp21Qngy\downloaded_packages
  library(survivalmodels)
  install.packages("reticulate")
#> Installing package into 'C:/Users/Rex/AppData/Local/R/win-library/4.3'
#> (as 'lib' is unspecified)
#> package 'reticulate' successfully unpacked and MD5 sums checked
#> Warning: cannot remove prior installation of package 'reticulate'
#> Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
#> C:\Users\Rex\AppData\Local\R\win-library\4.3\00LOCK\reticulate\libs\x64\reticulate.dll
#> to
#> C:\Users\Rex\AppData\Local\R\win-library\4.3\reticulate\libs\x64\reticulate.dll:
#> Permission denied
#> Warning: restored 'reticulate'
#> 
#> The downloaded binary packages are in
#>  C:\Users\Rex\AppData\Local\Temp\Rtmp21Qngy\downloaded_packages
  library(reticulate)
  install.packages("Hmisc")
#> Installing package into 'C:/Users/Rex/AppData/Local/R/win-library/4.3'
#> (as 'lib' is unspecified)
#> package 'Hmisc' successfully unpacked and MD5 sums checked
#> 
#> The downloaded binary packages are in
#>  C:\Users\Rex\AppData\Local\Temp\Rtmp21Qngy\downloaded_packages
  library(Hmisc)
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
  install.packages("survival")
#> Installing package into 'C:/Users/Rex/AppData/Local/R/win-library/4.3'
#> (as 'lib' is unspecified)
#> package 'survival' successfully unpacked and MD5 sums checked
#> 
#> The downloaded binary packages are in
#>  C:\Users\Rex\AppData\Local\Temp\Rtmp21Qngy\downloaded_packages
  library(survival)
  use_condaenv("C:/Users/Rex/anaconda3/envs/py36")
  theta1=NULL
x1=rchisq(100,1)
x2=rnorm(100,0,1)
n1=rnorm(50,-1,1)
n2=rnorm(50,2,1)
e1=sample(c(n1,n2), 100, replace=T)
Y1=10+x1*1.5+x2+e1
rcl1=rchisq(100,17.5)
Y_S1=NULL
d1=NULL
for (i in 1:100){
  if(Y1[i]>rcl1[i]){
    Y_S1[i]=rcl1[i]
    d1[i]=0
  }else{
    Y_S1[i]=Y1[i]
    d1[i]=1
  }
}

really_y1=NULL  
d11=NULL
for (i in 1:100){
  if(Y1[i]>rcl1[i]){
    really_y1[i]=Y1[i]
    d11[i]=0
  }else{
    really_y1[i]=Y1[i]
    d11[i]=1
  }
}  

PRE=function(x){
  a1=x
  t=c()
  for (i in 1:100) {
    t[i]=sum(ifelse(d1[i]==0,0,1)*
               ifelse(Y_S1[i]<Y_S1[-i],1,0)*
               ifelse(x1[i]*a1+x2[i]<x1[-i]*a1+x2[-i],1,0))
  }
  s = 1/(100*(99))*sum(t)
  return(-s)
}
theta1=optimize(PRE,lower=1,upper=2)$minimum

theta1
#> [1] 1.787113

x=x1*theta1+x2

origin=cbind(Y_S1,d1,x1,x2)
origin1=subset(origin,d1==1)
originp=subset(origin,d1==0)
ooo=rbind(origin1, originp)
oooo=data.frame(ooo)
colnames(oooo)=c("time","status","x1","x2")

n <- nrow(oooo)

ntest <- trunc(n / 3)
testid <- sample(1:n, ntest)

deepsurvmodel<-deepsurv(data = oooo[-testid , ],frac = FALSE,activation = "relu",
                        num_nodes = c(4L, 8L, 4L, 2L),dropout =0.1,early_stopping = TRUE,epochs = 10,
                        batch_size = 32, optimizer = "adam")

deepsurvall=predict(deepsurvmodel, oooo[testid , ],type = "risk") 
deepsurvall1=-deepsurvall

survivalmodels:::cindex(deepsurvall1,oooo[testid , ]$time)
#> [1] 0.6155303

concordance(Surv(time, status) ~ deepsurvall1, data = oooo[testid , ])
#> Call:
#> concordance.formula(object = Surv(time, status) ~ deepsurvall1, 
#>     data = oooo[testid, ])
#> 
#> n= 33 
#> Concordance= 0.377 se= 0.05336
#> concordant discordant     tied.x     tied.y    tied.xy 
#>        161        266          0          0          0

deepsurvall=predict(deepsurvmodel, oooo[testid , ],type = "risk") 
deepsurvall1=-deepsurvall

survivalmodels:::cindex(deepsurvall1,oooo[testid , ]$time)
#> [1] 0.6155303

concordance(Surv(time, status) ~ deepsurvall1, data = oooo[testid , ])
#> Call:
#> concordance.formula(object = Surv(time, status) ~ deepsurvall1, 
#>     data = oooo[testid, ])
#> 
#> n= 33 
#> Concordance= 0.377 se= 0.05336
#> concordant discordant     tied.x     tied.y    tied.xy 
#>        161        266          0          0          0
deephitmodel=deephit(data = oooo[-testid , ], frac = FALSE, activation = "relu",
                     num_nodes = c(4L, 8L, 4L, 2L), dropout = 0.1, early_stopping = TRUE, epochs = 10,
                     batch_size = 32, optimizer = "adam")
deephitall=predict(deephitmodel, oooo[testid , ],type = "risk")
deephitall1=-deephitall

concordance(Surv(time, status) ~ deephitall1, data = oooo[testid , ])
#> Call:
#> concordance.formula(object = Surv(time, status) ~ deephitall1, 
#>     data = oooo[testid, ])
#> 
#> n= 33 
#> Concordance= 0.4637 se= 0.05197
#> concordant discordant     tied.x     tied.y    tied.xy 
#>        198        229          0          0          0
deephitmodel=deephit(data = oooo[-testid , ], frac = FALSE, activation = "relu",
                     num_nodes = c(4L, 8L, 4L, 2L), dropout = 0.1, early_stopping = TRUE, epochs = 10,
                     batch_size = 32, optimizer = "adam")
deephitall=predict(deephitmodel, oooo[testid , ],type = "risk")
deephitall1=-deephitall

concordance(Surv(time, status) ~ deephitall1, data = oooo[testid , ])
#> Call:
#> concordance.formula(object = Surv(time, status) ~ deephitall1, 
#>     data = oooo[testid, ])
#> 
#> n= 33 
#> Concordance= 0.2225 se= 0.04052
#> concordant discordant     tied.x     tied.y    tied.xy 
#>         95        332          0          0          0

Created on 2023-11-11 with reprex v2.0.2 `

RaphaelS1 commented 1 year ago

Great job thanks. So your question seems to be about resampling and benchmarking more generally. I recommend using mlr3proba for what you need - chapter 3 of our book will help you https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html. Repeatedly running the same code won't give you an unbiased estimate of future performance, you need to use a resampling procedure instead

RexChen1228 commented 1 year ago

Thank you, Doctor. You've solved a problem that's been bothering me for a long time! I'm really grateful for your help. I'll read this article carefully and solve the issue. Thanks again!從我的iPhone傳送Raphael Sonabend @.***> 於 2023年11月11日 下午3:26 寫道: Great job thanks. So your question seems to be about resampling and benchmarking more generally. I recommend using mlr3proba for what you need - chapter 3 of our book will help you https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html. Repeatedly running the same code won't give you an unbiased estimate of future performance, you need to use a resampling procedure instead

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>