easystats / parameters

:bar_chart: Computation and processing of models' parameters
https://easystats.github.io/parameters/
GNU General Public License v3.0
438 stars 36 forks source link

Strange behavior in standardise_parameters() #889

Closed janmbecker closed 1 year ago

janmbecker commented 1 year ago

Hi, I was discovering some unexpected results in the function standardise_parameters(), so I tried to build a reproducible code example. In the example below I simulate data with a correlation of 0.5. In a univariate regression, I would expect a standardized estimate of 0.5. However, the function with method ="refit" gives me values of 1.69. Thus, somehow the standardization and refitting is not working properly.

require(parameters)
#> Loading required package: parameters
require(MASS)
#> Loading required package: MASS

set.seed(17367)

sigma <- matrix(c(0.09, 0, 0, 0.27),2,2)
sim.dat <- mvrnorm(500, rep(0,2), sigma)

y <- sim.dat[,1] + sim.dat[,2]

cor(y, sim.dat[,1])
#> [1] 0.4933165

lm.fit <- lm(y~0+sim.dat[,1])
lm.fit
#> 
#> Call:
#> lm(formula = y ~ 0 + sim.dat[, 1])
#> 
#> Coefficients:
#> sim.dat[, 1]  
#>        1.029

standardise_parameters(lm.fit, method = "refit")
#> # Standardization method: refit
#> 
#> Parameter    | Std. Coef. |       95% CI
#> ----------------------------------------
#> sim.dat[, 1] |       1.69 | [1.43, 1.96]

standardise_parameters(lm.fit, method = "posthoc")
#> # Standardization method: posthoc
#> 
#> Parameter    | Std. Coef. |       95% CI
#> ----------------------------------------
#> sim.dat[, 1] |       0.49 | [0.42, 0.57]

lm.fit.stand <- lm(scale(y)~0+scale(sim.dat[,1]))
lm.fit.stand
#> 
#> Call:
#> lm(formula = scale(y) ~ 0 + scale(sim.dat[, 1]))
#> 
#> Coefficients:
#> scale(sim.dat[, 1])  
#>              0.4933

Created on 2023-08-01 with reprex v2.0.2

mattansb commented 1 year ago

This is not an issue in {easystats} - the update() function cannot work properly with none-standard term names (sim.dat[, 1]) (especially when the variables are not stored in a data frame).

library(parameters)

set.seed(17367)

sigma <- matrix(c(0.09, 0, 0, 0.27), 2, 2)
sim.dat <- MASS::mvrnorm(500, rep(0, 2), sigma)

y <- sim.dat[, 1] + sim.dat[, 2]

cor(y, sim.dat[, 1])
#> [1] 0.4933165

lm.fit <- lm(y ~ 0 + sim.dat[, 1])
lm.fit
#> 
#> Call:
#> lm(formula = y ~ 0 + sim.dat[, 1])
#> 
#> Coefficients:
#> sim.dat[, 1]  
#>        1.029

model_data <- insight::get_data(lm.fit, source = "mf")
model_data_z <- datawizard::standardize(model_data)
update(lm.fit, data = model_data_z)
#> 
#> Call:
#> lm(formula = y ~ 0 + sim.dat[, 1], data = model_data_z)
#> 
#> Coefficients:
#> sim.dat[, 1]  
#>        1.694

Solution: name the predictor:

x <- sim.dat[, 1]

lm.fit <- lm(y ~ 0 + x)
lm.fit
#> 
#> Call:
#> lm(formula = y ~ 0 + x)
#> 
#> Coefficients:
#>     x  
#> 1.029

model_data <- insight::get_data(lm.fit, source = "mf")
model_data_z <- datawizard::standardize(model_data)
update(lm.fit, data = model_data_z)
#> 
#> Call:
#> lm(formula = y ~ 0 + x, data = model_data_z)
#> 
#> Coefficients:
#>      x  
#> 0.4933

So now standardise_parameters() also works:

standardise_parameters(lm.fit, method = "refit")
#> # Standardization method: refit
#> 
#> Parameter | Std. Coef. |       95% CI
#> -------------------------------------
#> x         |       0.49 | [0.42, 0.57]

standardise_parameters(lm.fit, method = "posthoc")
#> # Standardization method: posthoc
#> 
#> Parameter | Std. Coef. |       95% CI
#> -------------------------------------
#> x         |       0.49 | [0.42, 0.57]

Created on 2023-08-01 with reprex v2.0.2

janmbecker commented 1 year ago

If the method doesn't work with matrix, but only dataframes, then you should maybe issue a warning?

mattansb commented 1 year ago

sim.dat[, 1] is not a column matrix:

class(sim.dat[, 1])
#> [1] "numeric"

Matrices are standardized without a problem:

set.seed(17367)

sigma <- matrix(c(0.09, 0, 0, 0.27), 2, 2)
sim.dat <- MASS::mvrnorm(500, rep(0, 2), sigma)

y <- sim.dat[, 1] + sim.dat[, 2]
m <- sim.dat[, 1, drop = FALSE]
class(m)
#> [1] "matrix" "array"

lm.fit <- lm(y ~ 0 + m)
lm.fit
#> 
#> Call:
#> lm(formula = y ~ 0 + m)
#> 
#> Coefficients:
#>     m  
#> 1.029

datawizard::standardise(lm.fit)
#> 
#> Call:
#> lm(formula = y ~ 0 + m, data = data_std)
#> 
#> Coefficients:
#>      m  
#> 0.4933

Created on 2023-08-01 with reprex v2.0.2