chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
55 stars 27 forks source link

Subsetting in flexsurvspline #6

Closed WetRobot closed 9 years ago

WetRobot commented 9 years ago

Hi,

I've run into some minor issues / room for minor enhancements using flexsurvspline. Namely, subsetting requires the user to point to the appropriate data set when subsetting instead of the usual behaviour when using e.g. coxph. Additionally, a subset that leads to zero rows in an ad-hoc created factor variable crashes flexsurvspline.

> flex <- flexsurvspline(Surv(time = time, event = status) ~ sex + cut(age, 4), 
                                          data = survival::lung)
> cp <- coxph(Surv(time = time, event = status) ~ sex + cut(age, 4), 
                      data = survival::lung, 
                      subset = !is.na(wt.loss))
> subflex <- flexsurvspline(Surv(time = time, event = status) ~ sex + cut(age, 4), 
                                          data = survival::lung,
                                          subset = !is.na(wt.loss))
Error in flexsurvspline(Surv(time = time, event = status) ~ sex + cut(age,  : 
  object 'wt.loss' not found
> subflex <- flexsurvspline(Surv(time = time, event = status) ~ sex + cut(age, 4), 
                                          data = survival::lung, 
                                          subset = !is.na(survival::lung$wt.loss))
> subflex <- flexsurvspline(Surv(time = time, event = status) ~ sex + cut(age, 4), 
                                          data = survival::lung, 
                                          subset = survival::lung$age > 60)
Error in optim(method = "BFGS", par = c(-6.03077305549557, 1.06257724751466,  : 
  non-finite value supplied by optim
In addition: Warning message:
In coxph(as.formula(form), weights = wt) :
  X matrix deemed to be singular; variable 2 4

These are minor issues, but I only realized after a day or two to refer to the data set explicitly. Also, in my view, an ad-hoc created factor such as cut(age, 4) in the example should possibly only be evaluated on the subsetted data set?

Thank you for the great package by the way, it works very well in general.

chjackson commented 9 years ago

Thanks for the report - acknowledged as a bug, and intend to fix. I find this kind of scoping and environment stuff a pain - this is particularly tricky in this case, as flexsurvspline works as a wrapper for flexsurvreg.

WetRobot commented 9 years ago

In my limited experience with wrappers I usually substitute-evaluate the subset argument within the wrapper and pass a logical vector onwards.

> df <- data.frame(a=1:5)
> 
> wf <- function(x, subset) {
+   subset <- substitute(subset)
+   subset <- eval(subset, envir = x, enclos = parent.frame())
+   print(subset)
+   ## wrapper does something
+   x$b <- 5:1
+   
+   f(x, subset = subset)
+   
+ }
> 
> f <- function(x, subset) {
+   subset <- substitute(subset)
+   subset <- eval(subset, envir = x, enclos = parent.frame())
+   print(subset)
+   ## wrapped function does something else
+   print(mean(x[subset, ]$b))
+   
+ }
> 
> wf(df, a == 1)
[1]  TRUE FALSE FALSE FALSE FALSE
[1]  TRUE FALSE FALSE FALSE FALSE
[1] 5
chjackson commented 9 years ago

Thanks - I think this is fixed in the latest commit.

WetRobot commented 9 years ago

Thanks, works like a charm!

> flexsurvspline(Surv(time = time, event = status) ~ sex 
+                + cut(age, c(0, 56, 63, 69, Inf), right = FALSE), 
+                data = survival::lung, 
+                subset = survival::lung$age > 60)
Call:
flexsurvspline(formula = Surv(time = time, event = status) ~ 
    sex + cut(age, c(0, 56, 63, 69, Inf), right = FALSE), data = survival::lung, 
    subset = survival::lung$age > 60)

Estimates: 
                                                        data mean  est      L95%     U95%     se       exp(est)  L95%     U95%   
gamma0                                                       NA    -6.0057  -7.3686  -4.6428   0.6954       NA        NA       NA
gamma1                                                       NA     1.2280   1.0330   1.4229   0.0995       NA        NA       NA
sex                                                      1.3358    -0.6841  -1.1248  -0.2434   0.2248   0.5045    0.3247   0.7839
cut(age, c(0, 56, 63, 69, Inf), right = FALSE)[63,69)    0.4104    -0.5784  -1.3113   0.1545   0.3739   0.5608    0.2695   1.1671
cut(age, c(0, 56, 63, 69, Inf), right = FALSE)[69,Inf)   0.5000    -0.3029  -1.0133   0.4075   0.3624   0.7387    0.3630   1.5030

N = 134,  Events: 101,  Censored: 33
Total time at risk: 39063
Log-likelihood = -694.8459, df = 5
AIC = 1399.692

> 
> flexsurvspline(Surv(time = time, event = status) ~ sex 
+                + cut(age, c(0, 56, 63, 69, Inf), right = FALSE), 
+                data = survival::lung, 
+                subset = age > 60)
Call:
flexsurvspline(formula = Surv(time = time, event = status) ~ 
    sex + cut(age, c(0, 56, 63, 69, Inf), right = FALSE), data = survival::lung, 
    subset = age > 60)

Estimates: 
                                                        data mean  est      L95%     U95%     se       exp(est)  L95%     U95%   
gamma0                                                       NA    -6.0057  -7.3686  -4.6428   0.6954       NA        NA       NA
gamma1                                                       NA     1.2280   1.0330   1.4229   0.0995       NA        NA       NA
sex                                                      1.3358    -0.6841  -1.1248  -0.2434   0.2248   0.5045    0.3247   0.7839
cut(age, c(0, 56, 63, 69, Inf), right = FALSE)[63,69)    0.4104    -0.5784  -1.3113   0.1545   0.3739   0.5608    0.2695   1.1671
cut(age, c(0, 56, 63, 69, Inf), right = FALSE)[69,Inf)   0.5000    -0.3029  -1.0133   0.4075   0.3624   0.7387    0.3630   1.5030

N = 134,  Events: 101,  Censored: 33
Total time at risk: 39063
Log-likelihood = -694.8459, df = 5
AIC = 1399.692

> 
> flexsurvspline(Surv(time = time, event = status) ~ sex 
+                + cut(age, c(0, 56, 63, 69, Inf), right = FALSE), 
+                data = survival::lung[lung$age > 60,])
Call:
flexsurvspline(formula = Surv(time = time, event = status) ~ 
    sex + cut(age, c(0, 56, 63, 69, Inf), right = FALSE), data = survival::lung[lung$age > 
    60, ])

Estimates: 
                                                        data mean  est      L95%     U95%     se       exp(est)  L95%     U95%   
gamma0                                                       NA    -6.0057  -7.3686  -4.6428   0.6954       NA        NA       NA
gamma1                                                       NA     1.2280   1.0330   1.4229   0.0995       NA        NA       NA
sex                                                      1.3358    -0.6841  -1.1248  -0.2434   0.2248   0.5045    0.3247   0.7839
cut(age, c(0, 56, 63, 69, Inf), right = FALSE)[63,69)    0.4104    -0.5784  -1.3113   0.1545   0.3739   0.5608    0.2695   1.1671
cut(age, c(0, 56, 63, 69, Inf), right = FALSE)[69,Inf)   0.5000    -0.3029  -1.0133   0.4075   0.3624   0.7387    0.3630   1.5030

N = 134,  Events: 101,  Censored: 33
Total time at risk: 39063
Log-likelihood = -694.8459, df = 5
AIC = 1399.692