P-values from flexsurv are wrong (half the value they should be)

huftis commented 1 year ago

The P-values from flexsurvreg::tidy() are wrong; they are always half the value of the correct P-values. Example:

library(flexsurv)
l_fsr = flexsurvreg(Surv(futime, fustat) ~ ecog.ps,
                    dist = "weibull", data = ovarian)
tidy(l_fsr)
#> # A tibble: 3 × 5
#>   term    estimate std.error statistic p.value
#>   <chr>      <dbl>     <dbl>     <dbl>   <dbl>
#> 1 shape      1.11      0.283    NA      NA    
#> 2 scale   2358.     2157.       NA      NA    
#> 3 ecog.ps   -0.431     0.534    -0.808   0.210

The P-value, 0.21, should be twice as large, 0.42. This can be checked by comparison of the results from the same model fitted using the survival::survreg() function:

l_sr = survreg(Surv(futime, fustat) ~ ecog.ps,
               dist = "weibull", data = ovarian)
summary(l_sr)
#> 
#> Call:
#> survreg(formula = Surv(futime, fustat) ~ ecog.ps, data = ovarian, 
#>     dist = "weibull")
#>              Value Std. Error     z      p
#> (Intercept)  7.766      0.915  8.49 <2e-16
#> ecog.ps     -0.431      0.534 -0.81   0.42
#> Log(scale)  -0.108      0.254 -0.43   0.67
#> 
#> Scale= 0.897 
#> 
#> Weibull distribution
#> Loglik(model)= -97.6   Loglik(intercept only)= -98
#>  Chisq= 0.69 on 1 degrees of freedom, p= 0.41 
#> Number of Newton-Raphson Iterations: 5 
#> n= 26

(broom::tidy(l_sr) also returns the P-value 0.42.)

I have checked many examples, and the P-values are always half as large as the P-values from survreg. In theory, it might of course be survreg that was in the wrong, but using simulations from null models, I have also checked that flexsurvreg::tidy() only returns P-values in the range 0–.5, not 0–1.

chjackson commented 1 year ago

@mattwarkentin ? Is this a distinction between one-sided and two-sided tests? At least whatever flexsurv does here should be documented.

huftis commented 1 year ago

@mattwarkentin ? Is this a distinction between one-sided and two-sided tests? At least whatever flexsurv does here should be documented.

The code (https://github.com/chjackson/flexsurv-dev/blob/master/R/broom-funs.R#L67) currently says:

pvals <- pnorm(abs(stats), lower.tail = FALSE)

Note the abs() function. So it’s not a one-sided test, just a buggy two-sided test. The output of pnorm(...) should be multiplied by 2.

mattwarkentin commented 1 year ago

Seems to just be a bug. Good catch, @huftis. I have pushed a PR with a fix and added a test to guard against such issues in the future.

See #160

chjackson / flexsurv

P-values from flexsurv are wrong (half the value they should be) #159