Closed thialam closed 3 years ago
I'm going to assume for Cox models you're referring to neural-Cox models such as CoxTime as that's related to this package (and therefore helpful to other people who see this issue)
supposed to be encoded as factor (categorical) or numerical for COX?
'supposed' implies it's to do with the implementation, so if an implementation allows factors then use factors, otherwise encode them.
some tutorials (this one specifically said to encode sex as numeric)
In a Cox PH these are clearly equivalent:
library(survival)
rats_encode <- rats
rats_encode$sex <- as.numeric(rats_encode$sex == "m")
table(rats_encode$sex)
#>
#> 0 1
#> 150 150
identical(as.numeric(coef(coxph(Surv(time, status) ~ ., rats))),
as.numeric(coef(coxph(Surv(time, status) ~ ., rats_encode))))
#> [1] TRUE
I realised when I encode categorical variables as factors, they (understandably) give very different results
This shouldn't be the case with a linear Cox model however will be in a neural Cox model if you haven't set seeds. Also depends on the type of encoding
Hello Raphael, I came to this rather obvious question super late, but after some extensive research on both the R and SPSS front, I am still not sure - are categorical variables such as sex and ecog status, supposed to be encoded as factor (categorical) or numerical for COX?
When I was running the models on prepared data (such as lung and breast - which encoded categorical data as numeric) and following some tutorials (this one specifically said to encode sex as numeric), I didn't think twice, but now wrangling and analysing my own data - I realised when I encode categorical variables as factors, they (understandably) give very different results. Would you have any insight? Thank you so so much!