easystats / parameters

:bar_chart: Computation and processing of models' parameters
https://easystats.github.io/parameters/
GNU General Public License v3.0
437 stars 36 forks source link

degrees of freedom not included in summary outputs #613

Closed IndrajeetPatil closed 3 years ago

IndrajeetPatil commented 3 years ago
library(PROreg)
set.seed(123)

# defining the parameters
k <- 100
m <- 10
phi <- 0.5
beta <- c(1.5, -1.1)
sigma <- 0.5

# simulating the covariate and random effects
x <- runif(k, 0, 10)
X <- model.matrix(~x)
z <- as.factor(rBI(k, 4, 0.5, 2))
Z <- model.matrix(~ z - 1)
u <- rnorm(5, 0, sigma)

# the linear predictor and simulated response variable
eta <- beta[1] + beta[2] * x + crossprod(t(Z), u)
p <- 1 / (1 + exp(-eta))
y <- rBB(k, m, p, phi)
dat <- data.frame(cbind(y, x, z))
dat$z <- as.factor(dat$z)

# apply the model
mod_BBmm <-
  PROreg::BBmm(
    fixed.formula = y ~ x,
    random.formula = ~z,
    m = m,
    data = dat
  )
#> Iteration number: 1 
#> Iteration number: 2 
#> Iteration number: 3 
#> Iteration number: 4 
#> Iteration number: 5

library(parameters)

dof(mod_BBmm)
#> [1] 98 98

parameters(mod_BBmm)
#> # Fixed Effects
#> 
#> Parameter   | Log-Odds |   SE |         95% CI |     t |      p
#> ---------------------------------------------------------------
#> (Intercept) |     1.28 | 0.37 | [ 0.56,  2.00] |  3.48 | < .001
#> x           |    -1.12 | 0.16 | [-1.44, -0.80] | -6.90 | < .001
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

Same issue for BBreg:

# setup
set.seed(18)
library(PROreg)

# we simulate a covariate, fix the paramters of the beta-binomial
# distribution and simulate a response variable.
# then we apply the model, and try to get the same values.
k <- 1000
m <- 10
x <- rnorm(k, 5, 3)
beta <- c(-10, 2)
p <- 1 / (1 + exp(-1 * (beta[1] + beta[2] * x)))
phi <- 1.2
y <- PROreg::rBB(k, m, p, phi)

# model
mod_BBreg <- PROreg::BBreg(y ~ x, m)

library(parameters)

dof(mod_BBreg)
#> [1] 998 998

parameters(mod_BBreg)
#> # Fixed Effects
#> 
#> Parameter | Log-Odds |   SE |          95% CI |      t |      p
#> ---------------------------------------------------------------
#> Intercept |   -10.13 | 0.48 | [-11.08, -9.19] | -21.00 | < .001
#> x         |     2.03 | 0.09 | [  1.85,  2.22] |  21.51 | < .001
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

Ditto for multinom:

# setup
set.seed(123)
library(nnet)
library(MASS)
utils::example(topic = birthwt, echo = FALSE)

# model
bwt.mu <-
  nnet::multinom(
    formula = low ~ .,
    data = bwt,
    trace = FALSE
  )

library(parameters)

dof(bwt.mu)
#>  [1] 178 178 178 178 178 178 178 178 178 178 178

parameters(bwt.mu)
#> Parameter    | Log-Odds |       SE |         95% CI |     t |     p
#> -------------------------------------------------------------------
#> (Intercept)  |     0.82 |     1.24 | [-1.62,  3.26] |  0.66 | 0.508
#> age          |    -0.04 |     0.04 | [-0.11,  0.04] | -0.96 | 0.336
#> lwt          |    -0.02 | 7.08e-03 | [-0.03,  0.00] | -2.21 | 0.027
#> race [black] |     1.19 |     0.54 | [ 0.14,  2.24] |  2.22 | 0.026
#> race [other] |     0.74 |     0.46 | [-0.16,  1.65] |  1.60 | 0.109
#> smokeTRUE    |     0.76 |     0.43 | [-0.08,  1.59] |  1.78 | 0.075
#> ptd [TRUE]   |     1.34 |     0.48 | [ 0.40,  2.29] |  2.80 | 0.005
#> htTRUE       |     1.91 |     0.72 | [ 0.50,  3.33] |  2.65 | 0.008
#> uiTRUE       |     0.68 |     0.46 | [-0.23,  1.59] |  1.46 | 0.143
#> ftv [1]      |    -0.44 |     0.48 | [-1.38,  0.50] | -0.91 | 0.363
#> ftv [2+]     |     0.18 |     0.46 | [-0.72,  1.07] |  0.39 | 0.695
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

And mhurdle:

# setup
data("Interview", package = "mhurdle")
library(mhurdle)
#> Loading required package: Formula
#> Loading required package: truncreg
#> Loading required package: maxLik
#> Loading required package: miscTools
#> 
#> Please cite the 'maxLik' package as:
#> Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
#> 
#> If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
#> https://r-forge.r-project.org/projects/maxlik/
#> Loading required package: survival
#> Loading required package: texreg
#> Version:  1.37.5
#> Date:     2020-06-17
#> Author:   Philip Leifeld (University of Essex)
#> 
#> Consider submitting praise using the praise or praise_interactive functions.
#> Please cite the JSS article in your publications -- see citation("texreg").

# independent double hurdle model
idhm <- mhurdle::mhurdle(
  formula = vacations ~ car + size | linc + linc2 | 0,
  data = Interview,
  dist = "ln",
  h2 = TRUE,
  method = "bfgs"
)

library(parameters)

dof(idhm)
#> [1] 992 992 992 992 992 992 992 992

parameters(idhm)
#> # Fixed Effects
#> 
#> Parameter   | Coefficient |   SE |        95% CI |     t |      p
#> -----------------------------------------------------------------
#> (Intercept) |       -0.20 | 0.22 | [-0.64, 0.24] | -0.91 | 0.364 
#> linc        |        1.34 | 0.18 | [ 0.99, 1.68] |  7.63 | < .001
#> linc2       |        0.13 | 0.14 | [-0.13, 0.40] |  0.98 | 0.327 
#> 
#> # Zero-Inflated
#> 
#> Parameter   | Log-Odds |   SE |         95% CI |     t |      p
#> ---------------------------------------------------------------
#> (Intercept) |    -1.05 | 0.17 | [-1.39, -0.72] | -6.12 | < .001
#> car         |     0.04 | 0.07 | [-0.10,  0.17] |  0.54 | 0.586 
#> size        |     0.19 | 0.05 | [ 0.09,  0.29] |  3.59 | < .001
#> 
#> # Auxiliary
#> 
#> Parameter | Coefficient |   SE |       95% CI |    t |      p
#> -------------------------------------------------------------
#> sd sd     |        1.13 | 0.13 | [0.87, 1.39] | 8.60 | < .001
#> pos       |        0.59 | 0.18 | [0.24, 0.93] | 3.33 | < .001
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

And truncreg:

set.seed(123)
library(truncreg)
#> Loading required package: maxLik
#> Loading required package: miscTools
#> 
#> Please cite the 'maxLik' package as:
#> Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
#> 
#> If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
#> https://r-forge.r-project.org/projects/maxlik/
library(survival)

# data
data("tobin", package = "survival")

# model
cragg_trunc <-
  truncreg::truncreg(
    formula = durable ~ age + quant,
    data = tobin,
    subset = durable > 0
  )

library(parameters)

dof(cragg_trunc)
#> [1] 3 3 3 3

parameters(cragg_trunc)
#> # Fixed Effects
#> 
#> Parameter   | Coefficient |   SE |         95% CI |     t |      p
#> ------------------------------------------------------------------
#> (Intercept) |       12.60 | 9.22 | [-5.47, 30.67] |  1.37 | 0.172 
#> age         |        0.43 | 0.23 | [-0.01,  0.88] |  1.90 | 0.058 
#> quant       |       -0.12 | 0.03 | [-0.18, -0.05] | -3.64 | < .001
#> sigma       |        1.65 | 0.57 | [ 0.54,  2.77] |  2.91 | 0.004
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

And gls:

set.seed(123)
library(nlme)

# model
mod_gls <-
  nlme::gls(
    model = follicles ~ sin(2 * pi * Time) + cos(2 * pi * Time),
    data = Ovary,
    correlation = corAR1(form = ~ 1 | Mare)
  )

library(parameters)

dof(mod_gls)
#> [1] 305 305 305

parameters(mod_gls)
#> # Fixed Effects
#> 
#> Parameter          | Coefficient |   SE |         95% CI |     t |      p
#> -------------------------------------------------------------------------
#> (Intercept)        |       12.22 | 0.66 | [10.91, 13.52] | 18.38 | < .001
#> sin(2 * pi * Time) |       -2.77 | 0.65 | [-4.04, -1.51] | -4.30 | < .001
#> cos(2 * pi * Time) |       -0.90 | 0.70 | [-2.27,  0.47] | -1.29 | 0.198
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

And rq:

set.seed(123)
library(quantreg)
#> Loading required package: SparseM
#> 
#> Attaching package: 'SparseM'
#> The following object is masked from 'package:base':
#> 
#>     backsolve

# data
data(stackloss)

# model
mod_rq <-
  quantreg::rq(
    formula = stack.loss ~ .,
    data = stackloss,
    method = "br"
  )

library(parameters)

dof(mod_rq)
#> [1] 17 17 17 17

parameters(mod_rq)
#> # Fixed Effects
#> 
#> Parameter   | Coefficient |   SE |           95% CI |     t |      p
#> --------------------------------------------------------------------
#> (Intercept) |      -39.69 | 7.14 | [-53.69, -25.69] | -5.56 | < .001
#> Air Flow    |        0.83 | 0.13 | [  0.58,   1.08] |  6.55 | < .001
#> Water Temp  |        0.57 | 0.34 | [ -0.10,   1.24] |  1.68 | 0.111 
#> Acid Conc   |       -0.06 | 0.06 | [ -0.18,   0.06] | -1.01 | 0.328
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

nlrq:

# setup
set.seed(123)
library(quantreg)
#> Loading required package: SparseM
#> 
#> Attaching package: 'SparseM'
#> The following object is masked from 'package:base':
#> 
#>     backsolve

# preparing data
Dat <- NULL
Dat$x <- rep(1:25, 20)
Dat$y <- stats::SSlogis(Dat$x, 10, 12, 2) * rnorm(500, 1, 0.1)

# then fit the median using nlrq
Dat.nlrq <-
  quantreg::nlrq(
    formula = y ~ SSlogis(x, Asym, mid, scal),
    data = Dat,
    tau = 0.5,
    trace = FALSE
  )

library(parameters)

dof(Dat.nlrq)
#> [1] 497 497 497

parameters(Dat.nlrq)
#> # Fixed Effects
#> 
#> Parameter | Coefficient |   SE |         95% CI |      t |      p
#> -----------------------------------------------------------------
#> Asym      |        9.82 | 0.11 | [ 9.58, 10.05] |  84.98 | < .001
#> mid       |       11.84 | 0.07 | [11.70, 11.98] | 173.72 | < .001
#> scal      |        1.96 | 0.02 | [ 1.91,  2.01] |  81.93 | < .001
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

ivprobit:

# setup
set.seed(123)
library(ivprobit)
data(eco)

# model
pro <-
  ivprobit::ivprobit(
    formula = d2 ~ ltass + roe + div | eqrat + bonus | ltass + roe + div + gap + cfa,
    data = eco
  )

library(parameters)

dof(pro)
#> [1] 788 788 788 788 788 788

parameters(pro)
#> # Fixed Effects
#> 
#> Parameter |  Log-Odds |       SE |          95% CI |     t |     p
#> ------------------------------------------------------------------
#> Intercep  |    -16.86 |     9.73 | [-35.94,  2.21] | -1.73 | 0.084
#> ltass     |      0.95 |     0.62 | [ -0.27,  2.16] |  1.53 | 0.127
#> roe       |      0.07 |     0.11 | [ -0.16,  0.29] |  0.59 | 0.557
#> div       |  2.04e-06 | 4.27e-06 | [ -0.00,  0.00] |  0.48 | 0.634
#> eqrat     |      9.44 |    12.94 | [-15.92, 34.79] |  0.73 | 0.466
#> bonus     | -1.42e-06 | 2.82e-06 | [ -0.00,  0.00] | -0.50 | 0.615
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

garch:

set.seed(123)
library(tseries)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
data(EuStockMarkets)

# model
dax <- diff(log(EuStockMarkets))[, "DAX"]
dax.garch <- tseries::garch(x = dax, trace = FALSE)

library(parameters)

dof(dax.garch)
#> [1] 1856 1856 1856

parameters(dax.garch)
#> # Fixed Effects
#> 
#> Parameter | Coefficient |       SE |       95% CI |     t |      p
#> ------------------------------------------------------------------
#> a0        |    4.64e-06 | 1.96e-06 | [0.00, 0.00] |  6.14 | < .001
#> a1        |        0.07 |     0.01 | [0.05, 0.09] |  6.07 | < .001
#> b1        |        0.89 |     0.02 | [0.86, 0.92] | 53.82 | < .001
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using
#>   Wald t-distribution approximation.

Created on 2021-10-16 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

I think that's all for now. Sorry, Daniel 😢

mattansb commented 3 years ago

Need to boost that coverage...

IndrajeetPatil commented 3 years ago

Need to boost that coverage...

We support so many models, that it's kind of impossible to check all of them 😢 , nor should we try to...

IndrajeetPatil commented 3 years ago

Another one: bigglm

set.seed(123)
library(biglm)
#> Loading required package: DBI
data(trees)

# model
mod_bigglm <-
  biglm::bigglm(
    formula = log(Volume) ~ log(Girth) + log(Height),
    data = trees,
    chunksize = 10,
    sandwich = TRUE
  )

library(parameters)

dof(mod_bigglm)
#> [1] 28 28 28
parameters(mod_bigglm)
#> # Fixed Effects
#> 
#> Parameter    | Coefficient |   SE |         95% CI |     t |      p
#> -------------------------------------------------------------------
#> (Intercept)  |       -6.63 | 0.73 | [-8.06, -5.21] | -9.11 | < .001
#> Girth [log]  |        1.98 | 0.06 | [ 1.87,  2.09] | 35.62 | < .001
#> Height [log] |        1.12 | 0.19 | [ 0.74,  1.49] |  5.82 | < .001
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using a
#>   Wald t-distribution approximation.

Created on 2021-10-17 by the reprex package (v2.0.1)

IndrajeetPatil commented 3 years ago

Also complmrob:

# setup
set.seed(123)
library(complmrob)

# data
crimes <- data.frame(
  lifeExp = state.x77[, "Life Exp"],
  USArrests[, c("Murder", "Assault", "Rape")]
)

# model
mUSArr <- complmrob::complmrob(formula = lifeExp ~ ., data = crimes)

library(parameters)

dof(mUSArr)
#> [1] 46 46 46 46

parameters(mUSArr)
#> # Fixed Effects
#> 
#> Parameter   | Coefficient |   SE |         95% CI |      t |      p
#> -------------------------------------------------------------------
#> (Intercept) |       70.65 | 0.43 | [69.82, 71.49] | 165.81 | < .001
#> Murder      |       -2.23 | 0.43 | [-3.07, -1.40] |  -5.24 | < .001
#> Assault     |       -1.05 | 0.93 | [-2.86,  0.77] |  -1.13 | 0.264 
#> Rape        |        3.28 | 0.61 | [ 2.09,  4.47] |   5.39 | < .001
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed using a
#>   Wald t-distribution approximation.

Created on 2021-10-17 by the reprex package (v2.0.1)

mattansb commented 3 years ago

We support so many models, that it's kind of impossible to check all of them 😢 , nor should we try to...

I guess we don't need to if we have you!

strengejacke commented 3 years ago

BBreg/BBmm only report p-values based on normal distribution assumptions, even if we have t-stat and residual df. How should we deal with that?

strengejacke commented 3 years ago

@IndrajeetPatil would be great if you can add tests for these models here: https://github.com/easystats/parameters/blob/main/tests/testthat/test-model_parameters_df.R

I started adding first tests, you may just copy that pattern - we need to check whether df, CI (low) and p are correct for default (which should be residual for models in this thread) and normal.

Would be great if p-values are not rounded to 0, so we see differences between normal and default (residual).

Just run model_parameters(model), and then create the code for the test file using something like:

dput(round(params$CI_low, 5))
strengejacke commented 3 years ago

I bet we will discover some inconsistencies, in particular where model classes have "fixed" p-values. We then have to make sure that p_value() for these classes both allows to return the default and the "normally distributed" p-values (like I changed now for ivprobit).

strengejacke commented 3 years ago

actually, we only need to make degrees_of_freedom() work properly for each class, then we could remove almost all p_value() and ci() methods, and only use p_value.default() and ci.default() now.

strengejacke commented 3 years ago

let's see if I'm right...

bwiernik commented 3 years ago

Is this a printing issue or is the df column missing entirely? Would it be easier to just always print the df column instead of the clever but inconsistent header formatting?

strengejacke commented 3 years ago

Is this a printing issue

It's a printing issue, because Inf is erroneously returned now, not the residual df. It's due to the "API" changes, so no issue of the print-method itself, but rather a (new introduced) bug due to the recent changes from #570. But I think I fixed all of them now, and added a bunch of tests. We may add more tests in the future, to cover as many models as possible. We can add all tests to this file:

https://github.com/easystats/parameters/blob/main/tests/testthat/test-model_parameters_df.R

This is only run on GitHub, so no timing issues on CRAN.

strengejacke commented 3 years ago

Ok, I think it's working now, and it's implemented for a lot of models. There are still some classes where we only return the "default" p and CI. That is something we can revise step-by-step in future updates, as it's not that urgent. The most common model classes should be supported and work correct now.

IndrajeetPatil commented 3 years ago

It's a printing issue

It was not a printing issue though. The degrees of freedom column was missing from the dataframe itself.

Thanks for fixing this quickly!