chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
54 stars 28 forks source link

Add S3 method for anova generic #76

Closed mattwarkentin closed 2 years ago

mattwarkentin commented 4 years ago

Hi Chris,

I thought this discussion deserved a separate thread. Any interest in adding an S3 method for the anova() generic? I was looking at what generics were supported for other survival-type model fit objects to see where there might be gaps in existing support for flexsurvreg. #75 will add generic support for predict, residuals (possibly), tidy, glance, and augment, as well as I noticed you added nobs to the dev version.

In addition to the intrinsic value of adding this method, it would also mean we could include these statistics (loglik for null model, chisq statistic, and pvalue) in our glance() method to be in-line with glance methods for other survival models (e.g. coxph, survreg, etc.).

Off the top of my head I can envision two ways to implement this: (1) add a NULL model fit (i.e. model without covariates) to the flexsurvreg object, or (2) fit the NULL model at runtime when anova() is called (implemented in an internal function probably). Not sure if it is bad form to include fitting into anova(), but the former would require some work on the front-end model fitting object.

Thoughts?

chjackson commented 4 years ago

I think the idea of a function to compare a bunch of fitted models is good. Though this is a good kind of place to be opinionated and promote good practice, rather than following historical convention. I've never used the anova function in other packages so I don't know what people would expect it to contain.

The help page for the generic says it could include analysis of deviance - assuming that means comparing likelihoods - that's fine. Though a common use of flexsurv is to compare non-nested models with different parametric distributions - so it should include AIC (people will then expect BIC too). A tricky thing would be to determine which models are nested so which ones can be compared with a likelihood ratio test - but if we have AIC then I think that wouldn't add much information.

I'm not sure about automated model fitting and comparison though. Will users expect it to work on a single model with covariates, by dropping some or all covariates then comparing? I'm not sure about defaulting to a comparison between the current model and a model with no covariates - I think I'd prefer it to leave it to users to think about what models might be sensible to compare in their applications before they fit them, then the anova function can compare them.

mattwarkentin commented 4 years ago

For the anova(), the convention I have seen is that if a single model object is passed to anova then the default comparison is with a model with no covariates. Not a particularly exciting comparison, but this is the default heuristic. If multiple models are passed in then they are compared via likelihood ratio test, but it is still up to the user to determine whether the comparisons are sensible. For example:

c1 <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
c2 <- coxph(Surv(futime, fustat) ~ rx, data = ovarian)
c3 <- coxph(Surv(futime, fustat) ~ age + rx, data = ovarian)

anova(c1, c2, c3)
Analysis of Deviance Table
 Cox model: response is  Surv(futime, fustat)
 Model 1: ~ age
 Model 2: ~ rx
 Model 3: ~ age + rx
   loglik  Chisq Df P(>|Chi|)    
1 -27.838                        
2 -34.459 13.242  0 < 2.2e-16 ***
3 -27.042 14.835  1 0.0001174 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this way, we could

Regarding the comparison of non-nested models, I definitely agree that this is a good to take an opinionated stance on and will require more thought about how this is done. This may not need to be part of any generic, but it could be useful to provide users with a function for comparing non-nested models.

chjackson commented 4 years ago

(did some of your comment get eaten?)

anova.glm defaults to no hypothesis test - just comparing deviances, for any model comparison Which is fine for non-nested models. Users add a test argument if they want a test. If we do that, why not append a column for AIC?

counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts))
glm1 <- glm(counts ~ outcome, family = poisson())
glm2 <- glm(counts ~ treatment, family = poisson())
anova(glm1, glm2)
Analysis of Deviance Table

Model 1: counts ~ outcome
Model 2: counts ~ treatment
  Resid. Df Resid. Dev Df Deviance
1         6     5.1291            
2         6    10.5814  0  -5.4523
mattwarkentin commented 4 years ago

Hmm, I think I just started a thought and never finished it.

I like this suggestion. Seems like a suitable fit. In any case, glad to see there is interest. We can shelve this for now. I still have work to do on the other generics but this can be a future TODO.