RobinDenz1 / contsurvplot

An R-Package to Visualize the (Causal) Effect of a Continuous Variable on a Time-To-Event Outcome
https://robindenz1.github.io/contsurvplot/
GNU General Public License v3.0
12 stars 1 forks source link

Contsurvplot not plotting on graph correctly #8

Closed EmmaInzani closed 11 months ago

EmmaInzani commented 11 months ago

Hi Robin,

Firstly thank you very much for making such a useful and easy-to-use package for making beautiful survival plots!

I am trying to use plot_surv_area with cox hazards models of two similar response variables (latencies for chicks to peck things and the binary status of they did or didn't peck) with chick age as a predictor variable. The first model also has chick ID stratified to account for repeat measures in the model from individuals (I am normally using a coxme mixed model but for the purposes of plotting I have to use a coxph). This model plots well with the code shown below:

mod2 <- coxph(Surv(philia.latency, Obj.touch) ~ testage + strata(Chick.ID), data = dframe1b, x= TRUE)

n1<- plot_surv_area(time="philia.latency", status="Obj.touch", variable="testage", data=dframe1b, model=mod2, discrete=TRUE, bins=10, start_color="lightgrey", end_color="black", legend.title = "Chick age (Days)") n1

The second response variable model is the one I am having issues with. I have to include a binary group variable in this model to account for the group assigned to chicks and binary trial type variable to account for control and test trials. The sample size for this model is lot smaller at n=24 and code shown below:

library(survminer) library(contsurvplot) library(ggplot2) library(riskRegression) library(survival) library(pammtools)

neo1$TrialType = as.factor(neo1$TrialType) neo1$Group = as.factor(neo1$Group) neo1$ChickID = as.factor(neo1$ChickID) neo1$Status<-as.numeric(neo1$Status) neo1$AgeAtTest<-as.numeric(neo1$AgeAtTest)

mt2 <- coxph(Surv(Latency, Status) ~ AgeAtTest + Group +TrialType, data = neo1, x=TRUE) And for this model I cannot seem to get the code to actually plot the results of the model, it just plots the axis and legend as it appears to run without error. My co-author also gets the same result and cannot get this code to produce the graph. s1<- plot_surv_area(time="Latency", status="Status", variable="AgeAtTest", group = "Group", data=neo1, model=mt2, discrete=TRUE, bins=10, start_color="lightgrey", end_color="black", legend.title = "Chick age (Days)") s1 I was concerned the model maybe causing complications by having more predictor variables but I have tried removing the others and still it won't plot. A simple model plot can run for this that is not using Contsurvplot so I know the data can fit on the axis produced for the s1 graph but I would prefer to have the better graphics of this package to match the other modelled response variable.

To try make this reproducible I have put below the dput of the data I am using, and if help alternative models that splits the data by trial type to run separately so that other than chick age it only contains group as another predictor variable. Apologies if not quite correct this is my first time posting a query and I am new to survival models. If this needs more information please do let me know,

Best wishes, Emma

structure(list(ChickID = c("C1", "C1", "C10", "C10", "C11", "C11", "C12", "C12", "C13", "C13", "C14", "C14", "C15", "C15", "C16", "C16", "C17", "C17", "C18", "C18", "C19", "C19", "C2", "C2", "C20", "C20", "C21", "C21", "C22", "C22", "C23", "C23", "C24", "C24", "C3", "C3", "C4", "C4", "C5", "C5", "C6", "C6", "C7", "C7", "C8", "C8", "C9", "C9"), Group = c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L), TrialSet = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), TrialType = c("Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test", "Control", "Test"), Latency = c(8L, 4L, 458L, 600L, 14L, 13L, 68L, 177L, 14L, 11L, 7L, 12L, 19L, 8L, 13L, 74L, 12L, 13L, 600L, 405L, 421L, 45L, 10L, 41L, 6L, 9L, 12L, 11L, 5L, 18L, 5L, 4L, 14L, 29L, 336L, 12L, 9L, 19L, 96L, 57L, 299L, 600L, 46L, 42L, 600L, 600L, 100L, 13L), Status = c(1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L), ArrivalAge = c(7L, 7L, 1L, 1L, 11L, 11L, 5L, 5L, 9L, 9L, 25L, 25L, 7L, 7L, 6L, 6L, 7L, 7L, 4L, 4L, 3L, 3L, 3L, 3L, 9L, 9L, 17L, 17L, 17L, 17L, 20L, 20L, 20L, 20L, 2L, 2L, 5L, 5L, 8L, 8L, 5L, 5L, 6L, 6L, 1L, 1L, 15L, 15L), AgeAtTest = c(12L, 13L, 6L, 5L, 16L, 17L, 10L, 9L, 14L, 15L, 30L, 29L, 12L, 13L, 11L, 10L, 12L, 13L, 9L, 10L, 8L, 9L, 8L, 7L, 14L, 13L, 22L, 23L, 22L, 21L, 25L, 26L, 25L, 24L, 7L, 8L, 10L, 9L, 13L, 14L, 10L, 9L, 11L, 12L, 6L, 5L, 20L, 21L), TimeGroup = c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L )), class = "data.frame", row.names = c(NA, -48L))

Test trials

test1 = droplevels(subset(neo1,TrialType %in% c("Test")))

test1$Status<-as.numeric(test1$Status) test1$AgeAtTest<-as.numeric(test1$AgeAtTest)

mt3 <- coxph(Surv(Latency, Status) ~ AgeAtTest + Group, data = test1, x=TRUE)

Control trials

test2 = droplevels(subset(neo1,TrialType %in% c("Control")))

test2$Status<-as.numeric(test2$Status) test2$AgeAtTest<-as.numeric(test2$AgeAtTest)

mt4 <- coxph(Surv(Latency, Status) ~ AgeAtTest + Group, data = test2 x=TRUE)

Plots

s2<- plot_surv_area(time="Latency", status="Status", variable="AgeAtTest", group = "Group", data=test1, model=mt3, discrete=TRUE, bins=10, start_color="lightgrey", end_color="black", legend.title = "Chick age (Days)") s2

s3<- plot_surv_area(time="Latency", status="Status", variable="AgeAtTest", group = "Group", data=test2, model=mt4, discrete=TRUE, bins=10, start_color="lightgrey", end_color="black", legend.title = "Chick age (Days)") s3

RobinDenz1 commented 11 months ago

Thank you for the kind words. I am unfortunately unable to reproduce the problem. If I run this code:

library(contsurvplot)
library(survival)

neo1 <- structure(list(ChickID = c("C1", "C1", "C10", "C10", "C11", "C11",
"C12", "C12", "C13", "C13", "C14", "C14", "C15", "C15", "C16",
"C16", "C17", "C17", "C18", "C18", "C19", "C19", "C2", "C2",
"C20", "C20", "C21", "C21", "C22", "C22", "C23", "C23", "C24",
"C24", "C3", "C3", "C4", "C4", "C5", "C5", "C6", "C6", "C7",
"C7", "C8", "C8", "C9", "C9"), Group = c(1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L), TrialSet = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
TrialType = c("Control", "Test", "Control", "Test", "Control",
"Test", "Control", "Test", "Control", "Test", "Control",
"Test", "Control", "Test", "Control", "Test", "Control",
"Test", "Control", "Test", "Control", "Test", "Control",
"Test", "Control", "Test", "Control", "Test", "Control",
"Test", "Control", "Test", "Control", "Test", "Control",
"Test", "Control", "Test", "Control", "Test", "Control",
"Test", "Control", "Test", "Control", "Test", "Control",
"Test"), Latency = c(8L, 4L, 458L, 600L, 14L, 13L, 68L, 177L,
14L, 11L, 7L, 12L, 19L, 8L, 13L, 74L, 12L, 13L, 600L, 405L,
421L, 45L, 10L, 41L, 6L, 9L, 12L, 11L, 5L, 18L, 5L, 4L, 14L,
29L, 336L, 12L, 9L, 19L, 96L, 57L, 299L, 600L, 46L, 42L,
600L, 600L, 100L, 13L), Status = c(1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L), ArrivalAge = c(7L,
7L, 1L, 1L, 11L, 11L, 5L, 5L, 9L, 9L, 25L, 25L, 7L, 7L, 6L,
6L, 7L, 7L, 4L, 4L, 3L, 3L, 3L, 3L, 9L, 9L, 17L, 17L, 17L,
17L, 20L, 20L, 20L, 20L, 2L, 2L, 5L, 5L, 8L, 8L, 5L, 5L,
6L, 6L, 1L, 1L, 15L, 15L), AgeAtTest = c(12L, 13L, 6L, 5L,
16L, 17L, 10L, 9L, 14L, 15L, 30L, 29L, 12L, 13L, 11L, 10L,
12L, 13L, 9L, 10L, 8L, 9L, 8L, 7L, 14L, 13L, 22L, 23L, 22L,
21L, 25L, 26L, 25L, 24L, 7L, 8L, 10L, 9L, 13L, 14L, 10L,
9L, 11L, 12L, 6L, 5L, 20L, 21L), TimeGroup = c(1L, 1L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L
)), class = "data.frame", row.names = c(NA, -48L))

neo1$TrialType = as.factor(neo1$TrialType)
neo1$Group = as.factor(neo1$Group)
neo1$ChickID = as.factor(neo1$ChickID)
neo1$Status<-as.numeric(neo1$Status)
neo1$AgeAtTest<-as.numeric(neo1$AgeAtTest)

mt2 <- coxph(Surv(Latency, Status) ~ AgeAtTest + Group +TrialType,
             data = neo1, x=TRUE)

s1<- plot_surv_area(time="Latency",
                    status="Status",
                    variable="AgeAtTest",
                    group = "Group",
                    data=neo1,
                    model=mt2,
                    discrete=TRUE,
                    bins=10,
                    start_color="lightgrey",
                    end_color="black",
                    legend.title = "Chick age (Days)")
s1

I get the following output:

chicks

That seems to be correct to me. Could you perhaps show me what your result is if you run that code (if it differs from mine)? Or provide another example of the issue occuring?

EmmaInzani commented 11 months ago

That is correct! For me it shows this with my actual code and fulldata. However, even if I try to run it with just the section I gave you I still get this plot:

Neophobia

It must be an error in my original subsetting code for the dataset that I am not aware of. Thank you very much for such quick response and showing that the graph can be made!! :) Emma

RobinDenz1 commented 11 months ago

No problem, I am always happy to help if I can. Although I don't think that this is a bug in the contsurvplot package (which is why I closed this issue), you could do the following things to investigate this further:

1.) Check if it works with other functions of the contsurvplot package. For example, do plot_surv_at_t or plot_surv_matrix produce any results? 2.) Check if there are any unwanted missing values in your data 3.) Check if you pass the right data.frame to the function (perhaps there really is some faulty subsetting as you suggested?)

If you have any further questions on the package or the underlying methodology, please don't hesitate to ask. If you end up using it I would highly appreciate if you cited the associated paper. Thanks!

EmmaInzani commented 11 months ago

Hi Robin,

Sorry I have just realised the graphs display of the relationship between the latency to approach (response) and chick age variable actually appears to be displaying backwards in the scaling on the legend somehow? It is the opposite relationship to what the cox model and looking at the raw data suggests, which is that in later trials chicks are less likely to approach and will take longer to do so if they do approach. Any ideas why this plot is showing the opposite that younger chicks are less likely to approach which is incorrect?

Thank you again for your help and we will certainly be referencing the package in our paper!

RobinDenz1 commented 11 months ago

The hazard ratio of the AgeAtTest variable is 1.13094, which is positive. This means that higher ages are associated with a higher event probability, whatever that means in your context. This is exactly what the plot shows. Very young chicks (category 5 - 8) have the highest "survival probability", which means that it is less likely that they will experience the event of interest over time, while older chicks have a lower survival probability.

If you want to show the probability that they do experience the event instead (e.g. 1 - S(t)), you can set cif=TRUE in the function call, which basically inverts the plot.

EmmaInzani commented 11 months ago

Hi Robin,

Thank you for this, our interpretation of the data appears the opposite and I will discuss this with my coauthors. From our other model ( mod2 <- coxph(Surv(philia.latency, Obj.touch) ~ testage + strata(Chick.ID) ), which is another latency to approach an object, chick age variable is exp(coef) 0.9443, which we interpretted as higher ages were less associated with actually approaching the object and had the higher 'survival'. This made sense from what we observed in that older chicks were less likely to participate. It appeared the same was true for the latency to approach model you have helped us with that we couldnt originally plot. So I am confused. We havent included cif=TRUE in the coding (in first post) but I think we (co-authors) need to discuss these models in more depth before we try to plot again.

The plot below is for mod2 <- coxph(Surv(philia.latency, Obj.touch) ~ testage + strata(Chick.ID), data = dframe1b, x= TRUE)

image

Thank you for your time again, Emma

RobinDenz1 commented 11 months ago

No problem. In this new plot the interpretation is the opposite, that's correct.

There are many possible reasons for this, I'll list some below but that's all I can do right now:

1.) You are using different datasets for each model 2.) In the model I plotted first, you are adjusting for Group and TrialType. If those are correlated with the variable of interest and the outcome this may be the reason 3.) In the second model you are using strata(Chick.ID). Since there are only very few observations per Chick.ID this may also lead to problems. You may want to use frailty(Chick.ID) instead (which is very close to using coxme)

Hope this helps. If you have more questions or suggestions concerning the package I am happy to help again, but I prefer not to go into more detail concerning your specific study.

EmmaInzani commented 11 months ago

Thank you Robin, I think I understand this now I was forgetting the significance of whether hazard ratio was above or below 1 which is basic and silly to forget, also nothing to do with the package!

Also thank you for the advise with strata, we actually have a much bigger sample size for this second model so it runs well :)

Thank you again!!