Closed bgall closed 7 years ago
This is actually a very good point, I encountered this issue yesterday as well. I don't think dotwhisker::by_2sd()
does the same as the standardize()
method recommended by Gelman (2008).
Let's run a simple example.
First, I use Gelman's method who applies the following transformation (subtracting the mean and dividing by 2 standard deviations – see for this exact formula also Gelman and Hill (2007: 56)).
library(dotwhisker)
library(broom)
library(dplyr)
mtcars <- mtcars %>%
mutate(disp_by_2sd_divided = (disp - mean(disp)) / (2 * sd(disp)))
# Regression using the new variable
reg1_by_2sd_manual <- lm(mpg ~ disp_by_2sd_divided + factor(cyl), data = mtcars) %>%
tidy()
Now do the same with dotwhisker::by_2sd
:
# Regression with by_2sd() function
reg2 <- lm(mpg ~ disp + factor(cyl), data = mtcars)
reg2_by_2sd_dotwhisker <- tidy(reg2) %>% dotwhisker::by_2sd(mtcars)
Let's compare the output.
reg1_by_2sd_manual
# term estimate std.error statistic p.value
# 1 (Intercept) 23.234066 1.601981 14.503335 1.510070e-14
# 2 disp_by_2sd_divided -6.769195 2.629915 -2.573921 1.563814e-02
# 3 factor(cyl)6 -4.785846 1.649823 -2.900825 7.168524e-03
# 4 factor(cyl)8 -4.792086 2.886818 -1.659989 1.080770e-01
reg2_by_2sd_dotwhisker
# term estimate std.error statistic p.value
# 1 (Intercept) 29.534768 1.426621 20.702610 1.636895e-18
# 2 disp -6.769195 2.629915 -2.573921 1.563814e-02
# 3 factor(cyl)6 -4.785846 1.649823 -2.900825 7.168524e-03
# 4 factor(cyl)8 -4.792086 2.886818 -1.659989 1.080770e-01
While both methods give the same estimate for the independent variables, the values of the intercept differ. When I apply Gelman's standardize
function from the arm package, I get the result of the reg1_by_2sd_manual
regression, and not the reg2_by_2sd_dotwhisker
result.
# > arm::display(standardize(reg2))
# lm(formula = mpg ~ z.disp + factor(cyl), data = mtcars)
# coef.est coef.se
# (Intercept) 23.23 1.60
# z.disp -6.77 2.63
# factor(cyl)6 -4.79 1.65
# factor(cyl)8 -4.79 2.89
# ---
# n = 32, k = 4
# residual sd = 2.95, R-Squared = 0.78
So dotwhisker::by_2sd()
does obviously not replicate standardize()
. I think this problem should be addressed. Does dotwhisker::by_2sd()
have any advantage over Gelman's calculation? Otherwise, we might think about changing the formula to Gelman's proposed transformation in the dotwhisker package to avoid confusion.
Thanks, @stefan-mueller, for working up an example. It makes clear two points:
x1
equals x/(2*sd(x))
, then, for x
and x1
to predict the same change in y (which they must, for x1
simply rescales x
), the coefficient of x1
has to be 2*sd(x)
times bigger than the coefficient of x
. (Sorry, this would be even easier to make clear if I could insert some LaTeX math here, but apparently that's not straightforward.)by_2sd()
does not also subtract the mean as standardize()
does, the intercept is shifted by the coefficient times the mean of the variable (for all variables in the model). I tend to think the intercept isn't interesting and so I exclude it from my plots anyway, but I can imagine that maybe there's a context where it makes sense to care about the intercept, and it also makes sense forby_2sd()
to mimic standardize()
as closely as possible. I'll totally welcome a PR from anyone who wants to implement that. 😃 Thanks for your reply, this makes sense. I slightly extended the details section in by_2sd.R
to show the difference to arm::standardize
(see PR). If I find time, I might change the implementation, but as you say, I don't know whether this is really necessary as the intercept is not important in most cases.
At least we have now shown the difference to standardize
and highlighted that our estimates, standard errors, and p-values are the exact same for all IVs as if we used Gelman's (2008) formula.
dotwhisker::by_2sd() multiplies the coefficients by two standard deviations of the raw data. Both the CRAN documentation and the documentation here on Github link to the Gelman (2008) article, but the Gelman article actually divides by two standard deviations (the title of the article is "Scaling regression inputs by dividing by two standard deviations"). It would be helpful to add a few words to the documentation noting the difference in comparison to the paper to avoid others needlessly checking of the source code to confirm it is multiplying although the paper divides.