easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
1k stars 87 forks source link

Add model sig for linear models #174

Open mattansb opened 3 years ago

mattansb commented 3 years ago

This is something often reported, but I couldn't find an easystats way to get this?

m <- lm(mpg ~ ., data = mtcars)
summary(m)
#> 
#> Call:
#> lm(formula = mpg ~ ., data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.4506 -1.6044 -0.1196  1.2193  4.6271 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) 12.30337   18.71788   0.657   0.5181  
#> cyl         -0.11144    1.04502  -0.107   0.9161  
#> disp         0.01334    0.01786   0.747   0.4635  
#> hp          -0.02148    0.02177  -0.987   0.3350  
#> drat         0.78711    1.63537   0.481   0.6353  
#> wt          -3.71530    1.89441  -1.961   0.0633 .
#> qsec         0.82104    0.73084   1.123   0.2739  
#> vs           0.31776    2.10451   0.151   0.8814  
#> am           2.52023    2.05665   1.225   0.2340  
#> gear         0.65541    1.49326   0.439   0.6652  
#> carb        -0.19942    0.82875  -0.241   0.8122  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.65 on 21 degrees of freedom
#> Multiple R-squared:  0.869,  Adjusted R-squared:  0.8066 
#> F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07

Talking about the last line here: F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07.

(On a related note - maybe compare_performance(m1,m2) can give the sig tests from anova(m1,m2)?)

DominiqueMakowski commented 3 years ago
report::report_performance(lm(mpg ~ ., data = mtcars))
#> The model explains a significant and substantial proportion of variance (R2 = 0.87, F(10, 21) = 13.93, p < .001, adj. R2 = 0.81)

Created on 2020-12-08 by the reprex package (v0.3.0)

mattansb commented 3 years ago

Okay smarty pants... Then can I also have this here? :P

DominiqueMakowski commented 3 years ago

that information is in the attributes returned by model_performance if I recall

strengejacke commented 3 years ago

It's actually include in r2():

library(performance)
model <- lm(mpg ~ wt + cyl, data = mtcars)
str(r2(model))
#> List of 2
#>  $ R2         : Named num 0.83
#>   ..- attr(*, "names")= chr "R2"
#>  $ R2_adjusted: Named num 0.819
#>   ..- attr(*, "names")= chr "adjusted R2"
#>  - attr(*, "p")= Named num 6.81e-12
#>   ..- attr(*, "names")= chr "value"
#>  - attr(*, "F")= Named num 70.9
#>   ..- attr(*, "names")= chr "value"
#>  - attr(*, "df")= Named num 2
#>   ..- attr(*, "names")= chr "numdf"
#>  - attr(*, "df_residual")= Named num 29
#>   ..- attr(*, "names")= chr "dendf"
#>  - attr(*, "model_type")= chr "Linear"
#>  - attr(*, "class")= chr "r2_generic"

Created on 2020-12-08 by the reprex package (v0.3.0)

We thought we might need this information, but did not really develop a stringent "API" or concept how to include such information. Thus, we could have an own function here, maybe something like model_statistics()?

mattansb commented 3 years ago

Though not directly, it is related to measures of model-fit - that is at least where I would expect to see it. (Similar to how in compare_performance() we have the "tests" of BFs between the models.)

bwiernik commented 3 years ago

As an alternative, one idea might be to encourage the notion that these sorts of test statistics are model comparisons against a null model. To that end, perhaps there could be a generic null_model() function that estimates a null model for the given model, which could be passed to the various test_*() functions.

For example:

m <- lm(formula = mpg ~ wt + qsec + am, data = mtcars)
n <- null_model(m) # same as n <- lm(formula = mpg ~ 1, data = mtcars)

test_wald(m, n) # Yields the traditional F statistic, df, p value
DominiqueMakowski commented 3 years ago

one idea might be to encourage the notion that these sorts of test statistics are model comparisons against a null model

That's very true, it often gets understood as a property of the model itself rather than that of an implicit comparison.

m <- lm(formula = mpg ~ wt + qsec + am, data = mtcars)
performance::test_wald(m)
#> Error: At least two models are required to test them.

Created on 2021-04-04 by the reprex package (v1.0.0)

In the case above, we could throw an informative warning saying "Only one model was input, we assumed that you wanted to compare it to the null-model, so we computed that for you. Please do that explicitly test_wald(model, null_model(model)) to remove this warning"?

bwiernik commented 3 years ago

That’s a good idea.

strengejacke commented 3 years ago

For which models this is appropriate? Linear? Linear mixed? GLMs? Only those that return deviance / deviance residuals?

bwiernik commented 3 years ago

Hmm, these the common "null" models that seem to be estimated most often

  1. Linear: Intercept-only model
  2. GLM: Intercept-only model, same family and link
  3. Linear mixed: Model with population- and group-level intercepts only.
  4. GLMM: Models with population- and group-level intercepts only, same family and link.

If things like dispersion or zero-inflation models are included, a reasonable default choice would be to retain those features with intercepts only. Certainly, as you get into mixed models or zi/dispersion models, there are many alternative reasonable "null" models to use. Perhaps a message should state this when null_model() is used with mixed/zi/dispersion/etc. models?

What sort of models aren't returning deviance?

strengejacke commented 3 years ago

glmmTMB seems to return no deviance, nor residuals(type = "deviance").

bwiernik commented 3 years ago

glmmTMB includes deviance, but it seems that the default method doesn't work:

library(glmmTMB)
(m1 <- glmmTMB(count ~ mined + (1|site),
               zi=~mined,
               family=poisson, data=Salamanders))
summary(m1)
(m2 <- glmmTMB(count ~ 1 + (1|site),
               zi=~1,
               family=poisson, data=Salamanders))
summary(m2)

insight::get_deviance(m1)
insight::get_deviance(m2)

They don't have a method for deviance residuals though

iago-pssjd commented 2 years ago

Coming here from https://github.com/vincentarelbundock/modelsummary/issues/482, I was going to open a duplicate of issue #200 , but then I found it and arrived here. Is there any chance to add F-statistic, the degrees of freedom amd the p-value to model_performance? Any updates on this issue?

Thanks!

strengejacke commented 2 years ago

No update, but good chances this will be included. I'm wondering which statistics/p-values are relevant for which models? As far as I know, not many (or not all) models report those statistics?