Closed IndrajeetPatil closed 3 years ago
@mattansb You're the expert in df-names, what do you think?
I'm not sure what the problem is that we're trying to address here; @IndrajeetPatil can you elaborate?
The smooth part in the summary has columns edf
and ref.df
. the EDF (estimated df) is named Coefficient
, while Res df are names df
.
@IndrajeetPatil was suggesting naming those df1 and df2.
However, reading https://r.789695.n4.nabble.com/ref-df-in-mgcv-gam-td4756194.html we maybe can rename EDF to df_estimated
and omit Res. df? Or what would you suggest?
I would retain both of them as it would be weird to have an F-statistic with just one degree of freedom. In fact, that's what caught my attention - the fact that we had F-statistic here but just one df
column.
From what I can tell, the ref.df
isn't the usual denominator df of the F test. Instead:
set.seed(2)
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
dat <- gamSim(1, n = 400, dist = "normal", scale = 2)
#> Gu & Wahba 4 term additive model
b <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat)
s.table <- summary(b)$s.table
p1 <- pf(s.table[,3], df1 = s.table[,1], df2 = s.table[,2], lower.tail = F)
cbind(round(p1, 5),
round(s.table[,4], 5))
#> [,1] [,2]
#> s(x0) 0.07012 0.00013
#> s(x1) 0.00239 0.00000
#> s(x2) 0.00000 0.00000
#> s(x3) 0.28482 0.03782
Not the same p-values.
However:
p1 <- pf(s.table[,3], df1 = s.table[,1], df2 = df.residual(b), lower.tail = F)
cbind(round(p1, 5),
round(s.table[,4], 5))
#> [,1] [,2]
#> s(x0) 0.00041 0.00013
#> s(x1) 0.00000 0.00000
#> s(x2) 0.00000 0.00000
#> s(x3) 0.03782 0.03782
Created on 2020-11-28 by the reprex package (v0.3.0)
Hmm, I think then we can use this existing naming schema: df_num
and df_denom
as.data.frame(parameters::model_parameters(oneway.test(extra ~ group, data = sleep)))
#> F df_num df_denom p
#> 1 3.462627 1 17.77647 0.07939414
#> Method
#> 1 One-way analysis of means (not assuming equal variances)
Created on 2020-11-28 by the reprex package (v0.3.0)
Hmmm.... except for one way anova they should IMO df
and df_error
(also for t.test - should be df_error
, if we want to keep it consistent...)
In the general linear model framework, there two types of dfs:
For t, there is only really the latter (the model df is always 1). For F you need both.
IMO we should be consistently calling them df
and df_error
across the easyverse.
I like it!
I think the num.df
and denom.df
choice might have been influenced by broom
:
broom::tidy(oneway.test(extra ~ group, data = sleep))
#> Multiple parameters; naming those columns num.df, den.df
#> # A tibble: 1 x 5
#> num.df den.df statistic p.value method
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 1 17.8 3.46 0.0794 One-way analysis of means (not assuming equal~
Created on 2020-11-28 by the reprex package (v0.3.0.9001)
But we don't have to abide by their naming schema, especially with a strong statistical reasoning you so elegantly outlined to back up our choices! :)
I am pretty elegant, aren't I?
(I think broom
is coming from a purely statistical standpoint, whereas I see us gearing towards a broader audience ...)
So would you recommend reporting edf and df.residual, and omit res.df?
I think so, yes.
@IndrajeetPatil can you please check if it now also works for models of class scam
and gamlss
?
Huh, interestingly, the same issue is also true for scam
objects. Maybe we should be using the same method as for mgcv::gam
?
# setup
set.seed(123)
library(scam)
#> Loading required package: mgcv
#> Loading required package: nlme
#> This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
#> This is scam 1.2-8.
# data
n <- 200
x1 <- runif(n) * 6 - 3
f1 <- 3 * exp(-x1^2) # unconstrained term
f1 <- (f1 - min(f1)) / (max(f1) - min(f1)) # function scaled to have range [0,1]
x2 <- runif(n) * 4 - 1
f2 <- exp(4 * x2) / (1 + exp(4 * x2)) # monotone increasing smooth
f2 <- (f2 - min(f2)) / (max(f2) - min(f2)) # function scaled to have range [0,1]
f <- f1 + f2
y <- f + rnorm(n) * 0.1
dat <- data.frame(x1 = x1, x2 = x2, y = y)
# model
b <-
scam(
y ~ s(x1, k = 15, bs = "cr", m = 2) + s(x2, k = 25, bs = "mpi", m = 2),
family = gaussian(link = "identity"),
data = dat,
not.exp = FALSE
)
summary(b)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> y ~ s(x1, k = 15, bs = "cr", m = 2) + s(x2, k = 25, bs = "mpi",
#> m = 2)
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.2562 0.0703 3.645 0.000347 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(x1) 8.931 10.615 223.4 <2e-16 ***
#> s(x2) 5.620 6.785 380.1 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.9649 Deviance explained = 96.8%
#> GCV score = 0.010902 Scale est. = 0.010054 n = 200
parameters::model_parameters(b)
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI | Statistic | df | p
#> -----------------------------------------------------------------------------
#> (Intercept) | 0.26 | 0.07 | [0.12, 0.39] | 3.65 | 184.45 | < .001
#>
#> # Smooth Terms
#>
#> Parameter | Coefficient | Statistic | df | p
#> ------------------------------------------------------------
#> Smooth term (x1) | 8.93 | 223.43 | 184.45 | < .001
#> Smooth term (x2) | 5.62 | 380.05 | 184.45 | < .001
Created on 2020-11-29 by the reprex package (v0.3.0)
Things look good with gamlss
.
# setup
set.seed(123)
library(gamlss)
#> Loading required package: splines
#> Loading required package: gamlss.data
#>
#> Attaching package: 'gamlss.data'
#> The following object is masked from 'package:datasets':
#>
#> sleep
#> Loading required package: gamlss.dist
#> Loading required package: MASS
#> Loading required package: nlme
#> Loading required package: parallel
#> ********** GAMLSS Version 5.2-0 **********
#> For more on GAMLSS look at https://www.gamlss.com/
#> Type gamlssNews() to see new features/changes/bug fixes.
# model
g <-
gamlss::gamlss(
formula = y ~ pb(x),
sigma.fo = ~ pb(x),
family = BCT,
data = abdom,
method = mixed(1, 20)
)
#> GAMLSS-RS iteration 1: Global Deviance = 4771.925
#> GAMLSS-CG iteration 1: Global Deviance = 4771.013
#> GAMLSS-CG iteration 2: Global Deviance = 4770.994
#> GAMLSS-CG iteration 3: Global Deviance = 4770.994
summary(g)
#> ******************************************************************
#> Family: c("BCT", "Box-Cox t")
#>
#> Call: gamlss::gamlss(formula = y ~ pb(x), sigma.formula = ~pb(x),
#> family = BCT, data = abdom, method = mixed(1, 20))
#>
#> Fitting method: mixed(1, 20)
#>
#> ------------------------------------------------------------------
#> Mu link function: identity
#> Mu Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -64.44299 1.33397 -48.31 <2e-16 ***
#> pb(x) 10.69464 0.05787 184.80 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> ------------------------------------------------------------------
#> Sigma link function: log
#> Sigma Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.65041 0.10859 -24.407 < 2e-16 ***
#> pb(x) -0.01002 0.00380 -2.638 0.00855 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> ------------------------------------------------------------------
#> Nu link function: identity
#> Nu Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.1072 0.6296 -0.17 0.865
#>
#> ------------------------------------------------------------------
#> Tau link function: log
#> Tau Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.4948 0.4261 5.855 7.86e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> ------------------------------------------------------------------
#> NOTE: Additive smoothing terms exist in the formulas:
#> i) Std. Error for smoothers are for the linear effect only.
#> ii) Std. Error for the linear terms maybe are not accurate.
#> ------------------------------------------------------------------
#> No. of observations in the fit: 610
#> Degrees of Freedom for the fit: 11.7603
#> Residual Deg. of Freedom: 598.2397
#> at cycle: 3
#>
#> Global Deviance: 4770.994
#> AIC: 4794.515
#> SBC: 4846.419
#> ******************************************************************
parameters::model_parameters(g)
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI | t(598.24) | p
#> ------------------------------------------------------------------------
#> (Intercept) | -64.44 | 1.33 | [-67.06, -61.83] | -48.31 | < .001
#> x | 10.69 | 0.06 | [ 10.58, 10.81] | 184.80 | < .001
#>
#> # Sigma
#>
#> Parameter | Coefficient | SE | 95% CI | t(598.24) | p
#> --------------------------------------------------------------------------
#> (Intercept) | -2.65 | 0.11 | [-2.86, -2.44] | -24.41 | < .001
#> x | -0.01 | 3.80e-03 | [-0.02, 0.00] | -2.64 | 0.009
#>
#> # Nu
#>
#> Parameter | Coefficient | SE | 95% CI | t(598.24) | p
#> --------------------------------------------------------------------
#> (Intercept) | -0.11 | 0.63 | [-1.34, 1.13] | -0.17 | 0.865
#>
#> # Tau
#>
#> Parameter | Coefficient | SE | 95% CI | t(598.24) | p
#> --------------------------------------------------------------------
#> (Intercept) | 2.49 | 0.43 | [1.66, 3.33] | 5.86 | < .001
Created on 2020-11-29 by the reprex package (v0.3.0)
edf
(estimated degrees of freedom) here are contained in theCoefficient
column forSmooth Terms
.Maybe we can rename these columns to
df1
anddf2
, and order them asParameter
,F
,df1
,df2
, andp.value
?Created on 2020-11-27 by the reprex package (v0.3.0)
Session info
``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.3 (2020-10-10) #> os macOS Mojave 10.14.6 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Berlin #> date 2020-11-27 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) #> bayestestR 0.7.5.1 2020-11-27 [1] Github (easystats/bayestestR@ba68c88) #> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2) #> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.3) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2) #> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2) #> insight 0.11.0.1 2020-11-27 [1] Github (easystats/insight@7639faf) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.3) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.3) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2) #> mgcv * 1.8-33 2020-08-27 [2] CRAN (R 4.0.3) #> nlme * 3.1-149 2020-08-23 [2] CRAN (R 4.0.3) #> parameters 0.9.0.1 2020-11-27 [1] Github (easystats/parameters@abc447c) #> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2) #> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2) #> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.2) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2) #> rlang 0.4.9 2020-11-26 [1] CRAN (R 4.0.3) #> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.3) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.2) #> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.2) #> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2) #> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) #> #> [1] /Users/patil/Library/R/4.0/library #> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ```