I have a big imputed dataset of many variables. Some of them are used to derive a compound endpoint, let's call it "success of therapy". Now I want to plot the curves the cumulative probability of the success (= just 1-Kaplan-Meier survival) for two treatments, surrounded by their pointwise CIs.
The problem is I have several imputed datasets. Pooling works out of the box with Cox, but I want assumption-free estimator across 2 groups, the Kaplan-Meier.
Let's assume this trivial example. TIt has no gaps in survival or the used predictor - for the sake simplicity in this example. I want to see, if multiply generated the same dataset will give - after pooling - the same results as the raw approach.
Of course my REAL data will have some gaps. After the imputation the status will vary a bit between the samples. Times will be always the same.
So this my exemplary data. Not much meaningful, but suffices:
surv_data <- as.data.frame(list(time=c(4,3,1,1,2,2,3,5,2,4,5,1),
status=c(1,1,1,0,1,1,0,0,1,1,0,0),
x=c(0,2,1,1,4,7,0,1,1,2,0,1),
sex=c(0,0,0,0,1,1,1,1,0,1,0,0)))
# Let's "impute" (generate) just 10 identical datasets. All will have same survival (no imputation in time, status and sex
imp <- mice(surv_data,m=10)
Good! They agree being are the same datasetes. Just wanted to check if the construction is correct, so when will pass truly different imputed datasets it should work too.
There is a minor rounding issue at upper CI or it was truncated by the survfit summary equally at 1.
I tried also the log-log to avoid these artefacts:
But isn't it too simple, too naive? Would you accept such solution? For pooling with Rubin's rules, we assume approximate normality, and I heard the complementary log-log transformation should be applied to the survival probability. This is not the same as the log-log method of obtaining CIs for the survival probability used in survdif(), but related - it's the "mirror". What's your opinion on cloglog vs. log-log?
PS: If so, maybe you might consider adding pooling for KM estimates just to shorten the calculations and minimize the risk of errors/typos?
Dear Authors of mice.
I have a big imputed dataset of many variables. Some of them are used to derive a compound endpoint, let's call it "success of therapy". Now I want to plot the curves the cumulative probability of the success (= just 1-Kaplan-Meier survival) for two treatments, surrounded by their pointwise CIs.
The problem is I have several imputed datasets. Pooling works out of the box with Cox, but I want assumption-free estimator across 2 groups, the Kaplan-Meier.
Let's assume this trivial example. TIt has no gaps in survival or the used predictor - for the sake simplicity in this example. I want to see, if multiply generated the same dataset will give - after pooling - the same results as the raw approach.
Of course my REAL data will have some gaps. After the imputation the status will vary a bit between the samples. Times will be always the same.
So this my exemplary data. Not much meaningful, but suffices:
Should it be something like this?
which agrees with the result from complete data:
The negatives come from the normal approximation. I'll try to transform it later.
I tried also the "veteran" dataset.
Good! They agree being are the same datasetes. Just wanted to check if the construction is correct, so when will pass truly different imputed datasets it should work too.
There is a minor rounding issue at upper CI or it was truncated by the survfit summary equally at 1.
I tried also the log-log to avoid these artefacts:
Perfect agreement.
But isn't it too simple, too naive? Would you accept such solution? For pooling with Rubin's rules, we assume approximate normality, and I heard the complementary log-log transformation should be applied to the survival probability. This is not the same as the log-log method of obtaining CIs for the survival probability used in survdif(), but related - it's the "mirror". What's your opinion on cloglog vs. log-log?
PS: If so, maybe you might consider adding pooling for KM estimates just to shorten the calculations and minimize the risk of errors/typos?