Interaction term power in a model with a binary factor and a continuous covariate

Hi, I would like to use your package to have a power estimate of the interaction term in simple models with 2 treatments and a baseline covariate in which the slopes parallelism is not sustainable but the interaction term estimate does not reject the zero value.

I was reading in the paper that the package would work with continuous and ordinal variables and with binary factors, so I was wondering what is the correlation to be computed between the binary factor and the continuous covariate. I am unsure, but I think I understand that the simple Pearson correlation would be everything it is needed.

However, to calculate Pearson's correlation, numeric variables are needed so the coding of the factor is determinant. Some attempts resulted in too different estimates:

>DF <-
structure(list(y = c(120, 100, 100, 100, 100, 100, 100, 140, 
100, 83.329999999999998, 100, 100, 100, 100, 120, 100, 100, 100, 
100, 100, 100, 100, 100, 100, 80, 100, 100, 100, 66.659999999999997, 
100, 75, 85.709999999999994, 100, 100, 100, 66.659999999999997, 
75, 80, 80, 100), x1 = c(6, 7, 7, 5, 4, 6, 6, 6, 7, 7, 5, 4, 
5, 5, 6, 7, 5, 4, 6, 6, 6, 4, 5, 4, 7, 5, 5, 6, 7, 5, 5, 8, 6, 
6, 7, 7, 6, 6, 7, 7), x2 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), levels = c("Control", "Test"), class = "factor")), class = "data.frame", row.names = c(NA, 
-40L))
attach(DF)

## Coding 1 and 2
> r.int<-cor(x1*as.numeric(x2), y);r.int
[1] 0.3413899
> r.tr<-cor(as.numeric(x2), y);r.tr
[1] 0.4755837
> r.cov<-cor(x1, y);r.cov
[1] -0.2668464
> r.xx<-cor(as.numeric(x2),x1);r.xx
[1] -0.1194887

> test_power<-power_interaction_r2(
  alpha = 0.05,            # alpha, for the power analysis
  N = 40,                     # sample size
  r.x1x2.y = r.int,         # interaction effect to test (correlation between x1*x2 and y)
  r.x1.y = r.cov,           # correlation between x1 and y
  r.x2.y = r.tr,              # correlation between x2 and y
  r.x1.x2 =r.xx             # correlation between x1 and x2
 )
>  
> test_power
        pwr
1 0.6585761

> test_power<-power_interaction(
  n.iter=9999,
  alpha = 0.05,                 # alpha, for the power analysis
  N = seq(10, 60, 5),        # sample size
  r.x1x2.y = r.int,             # interaction effect to test (correlation between x1*x2 and y)
  r.x1.y = r.tr,                  # correlation between x1 and y
  r.x2.y = r.cov,               # correlation between x2 and y
  r.x1.x2 =r.xx                 # correlation between x1 and x2
 )
Performing 109989 simulations
>  
> test_power
    N       pwr
1  10 0.1240124
2  15 0.2216222
3  20 0.3185319
4  25 0.4259426
5  30 0.5034503
6  35 0.5864586
7  40 0.6467647
8  45 0.7017702
9  50 0.7578758
10 55 0.8034803
11 60 0.8338834

> windows(8,4);plot_power_curve(power_data = test_power, # output from power_interaction()
                  power_target = .8,        # the power we want to achieve 
                  x = "N" # x variable
                  )

power_1_2

### Coding 0 and 1 
> r.int<-cor(x1*(as.numeric(x2)-1), y);r.int
[1] 0.4629379
> r.tr<-cor(as.numeric(x2)-1, y);r.tr
[1] 0.4755837
> r.cov<-cor(x1, y);r.cov
[1] -0.2668464
> r.xx<-cor(as.numeric(x2)-1,x1);r.xx
[1] -0.1194887

> test_power<-power_interaction_r2(
  alpha = 0.05,             # alpha, for the power analysis
  N = 40,                   # sample size
  r.x1x2.y = r.int,           # interaction effect to test (correlation between x1*x2 and y)
  r.x2.y = r.tr,             # correlation between x1 and y
  r.x1.y = r.cov,            # correlation between x2 and y
  r.x1.x2 =r.xx             # correlation between x1 and x2
 )

> test_power
        pwr
1 0.8945711

test_power<-power_interaction(
  n.iter=9999,
  alpha = 0.05,             # alpha, for the power analysis
  N = seq(10, 60, 5),        # sample size
  r.x1x2.y = r.int,           # interaction effect to test (correlation between x1*x2 and y)
  r.x1.y = r.tr,             # correlation between x1 and y
  r.x2.y = r.cov,            # correlation between x2 and y
  r.x1.x2 =r.xx             # correlation between x1 and x2
 )
Performing 109989 simulations
>  
> test_power
    N       pwr
1  10 0.2093209
2  15 0.4018402
3  20 0.5626563
4  25 0.6880688
5  30 0.7862786
6  35 0.8492849
7  40 0.8953895
8  45 0.9302930
9  50 0.9528953
10 55 0.9707971
11 60 0.9782978

> windows(8,4);plot_power_curve(power_data = test_power, # output from power_interaction()
                  power_target = .8,        # the power we want to achieve 
                  x = "N" # x variable
                  )

power_0_1

But the most bewildering observation is that, contrary to the literature stating the interaction terms usually require about four time the sample-size of the main effects to detect a comparable effect, the data simulated from my correlations shows an opposite situation.


### Original model
> model<-lm(y~x1*x2, data=DF)
> summary(model)

Call:
lm(formula = y ~ x1 * x2, data = DF)

Residuals:
    Min      1Q  Median      3Q     Max 
-20.411  -3.316  -3.086   4.932  36.799 

Coefficients:
            Estimate Std. Error t value       Pr(>|t|)    
(Intercept)  121.513     14.731   8.249 0.000000000815 ***
x1            -5.220      2.437  -2.142          0.039 *  
x2Test       -19.004     21.047  -0.903          0.373    
x1:x2Test      5.336      3.561   1.498          0.143    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.67 on 36 degrees of freedom
Multiple R-squared:  0.3137,    Adjusted R-squared:  0.2565 
F-statistic: 5.486 on 3 and 36 DF,  p-value: 0.003292

### Simulated model
### CODING 1 and 2 
r.int<-cor(x1*as.numeric(x2), y);r.int
r.tr<-cor(as.numeric(x2), y);r.tr
r.cov<-cor(x1, y);r.cov
r.xx<-cor(as.numeric(x2),x1);r.xx

set.seed(9)
example_data<-generate_interaction(
 N = 40,                   # sample size
 r.x1x2.y = r.int,           # interaction effect to test (correlation between x1*x2 and y)
 r.x1.y = r.cov,             # correlation between x1 and y
 r.x2.y = r.tr,            # correlation between x2 and y
 r.x1.x2 =r.xx             # correlation between x1 and x2
)

test_interaction(data = example_data, # simulated data
                 q = 2                # number of simple slopes
                 )
$linear.model
       Estimate Std. Error   t value       Pr(>|t|)
x1   -0.1967019 0.11343993 -1.733974 0.091480126373
x2    0.4007279 0.11150478  3.593818 0.000967665885
x1x2  0.5178065 0.09202079  5.627061 0.000002190362

In the original model x1 was the sole term with p<0.05 while the interaction p was 0.143. In the simulated model the opposite is true.

How can the behavior of these simulations be reconciled with what is known in the literature? Maybe the standardization of the variables has a role? And if it has, is it possible to backtransform the power to be representative of the model with non-standardized variables? How should the correlations be computed to get the correct result from your package?

Thanks in advance .

Paolo

Hi! This package uses the correlation coefficient as the input parameter, so of course you will get different results if you provide different inputs. The statement regarding ordinal variables in the package documentation is regarding the fact that you can use the simulation function to simulate ordinal or binary variables. Note that the package makes distributional assumptions (i.e., variables are symmetric) which may or may not be met by your own data. You reference statements in the literature that interactions need '4 times' the sample size to be detected. This is not because interactions are somehow statistically complicated, making them difficult to detect. Rather, this recommendation comes from multiple observations across fields that replicable interactions are small, and so one should plan on detecting an interactions that is quite small (perhaps ~ 1/2 the size of main effects, yielding a power recommendation of 4x the original sample). The simulated dataset you reference at the end is drawn from a population with the parameters you supplied, it is not an exact copy, and hence it is standard for the p-values to change across simulations. The power-estimate for the interaction and the main effect in essence tell you how frequently these parameters were significant across all simulations. Best, David

Thank you for your quick answer. Everything you said is understandable to me. I was meaning that, in calculating power, the critical information is the MCID (the minimum clinical relevant effect), which, if I have correctly understood, in your procedure is somehow implied by the use of correlations. In particular, I was thinking that the model I showed includes not standardized variables (so the value of a coefficient relating to a 1 unit difference in a variable depends on the scale) while I presume that the effects obtained by means of correlations are relating to differences of 1 SD. If I am not wrong, that would mean that the effects represented by my coefficients are mostly smaller than 1 SD, and this agrees with my knowledge in the field.

So what I would like to know is whether you think that your package is mostly or exclusively intended for the analysis of standardized variables or whether it may be used also for cases like that I showed and therefore a way exists to back-transform the results and make them appropriate to unstandardized variables. As an alternative, whether actually there is no difference between standardized and unstandardized variables (beacause the effect to be detected is in both cases divided by the SD), which implies that I have to reconsider my sight of the matter, as your answer would seem to indicate.

The second thing I would like to know is whether or not you think the way to calculate the correlation of the interaction term between a factor and a continuous variable with the response is the first I showed above:

r.int<-cor(x1*as.numeric(x2), y)

i.e. by coding the factor as a numeric variable made up of 1s and 2s (that should be equivalent to the point biserial correlation), considering that the alternative coding (numeric 0 and 1) provides wildly different results.

Of course, if my hypotheses were both wrong, it would be great if you would show me the correct way to do it by using the data I provided.

Your package could be still more useful if these facets could be clarified. I believe it needs and deserves an in depth discussion about the meaning of its results for them to be fully exploited . I appreciate a lot your work and your help and I apologize if I am too much demanding.

Paolo

Hi! Regarding standardization - the package uses correlations as the input, and gives effect sizes in standardized units. We think this is reasonable, as many indicators of clinically meaningful effect sizes (e.g., Cohen's D) are standardized metrics. Using standardized measures also just makes the math a lot easier.

Regarding computing correlations, you should follow the same procedure you would follow if you were going to enter these variables into a linear regression (because that's what the package is in fact running). Now, the reason you are getting different answers in your two power analyses is because you are not mean-centering x1 and x2 before computing the interaction term.

Maybe I figured out what my problem was. If correlations are the input of your software and if they are interpreted as regression coefficients, they have to be correlations of standardized variables. If I correctly argue, there is no possible choice: both the variables has to be standardized before the correlations are calculated.

I tried to see whether this approach would have provided a sensible result.

> r.int<-cor(scale(x1,scale=FALSE)*scale(as.numeric(x2),scale=FALSE), y);r.int
          [,1]
[1,] 0.2199223
> r.tr<-cor(scale(as.numeric(x2),scale=FALSE), y);r.tr
          [,1]
[1,] 0.4755837
> r.cov<-cor(scale(x1,scale=FALSE), y);r.cov
           [,1]
[1,] -0.2668464
> r.xx<-cor(scale(as.numeric(x2),scale=FALSE),x1);r.xx
           [,1]
[1,] -0.1194887

> test_power<-power_interaction(
  n.iter=9999,
  alpha = 0.05,             # alpha, for the power analysis
  N = seq(40, 160, 5),        # sample size
  r.x1x2.y = r.int,           # interaction effect to test (correlation between x1*x2 and y)
  r.x1.y = r.tr,             # correlation between x1 and y
  r.x2.y = r.cov,            # correlation between x2 and y
  r.x1.x2 =r.xx             # correlation between x1 and x2
 )
Performing 249975 simulations

> test_power
     N       pwr
1   40 0.3245325
2   45 0.3593359
3   50 0.4052405
4   55 0.4360436
5   60 0.4700470
6   65 0.5039504
7   70 0.5360536
8   75 0.5604560
9   80 0.5917592
10  85 0.6355636
11  90 0.6447645
12  95 0.6704670
13 100 0.6964696
14 105 0.7222722
15 110 0.7414741
16 115 0.7617762
17 120 0.7849785
18 125 0.7983798
19 130 0.8160816
20 135 0.8289829
21 140 0.8452845
22 145 0.8546855
23 150 0.8713871
24 155 0.8751875
25 160 0.8947895

windows(8,4);plot_power_curve(power_data = test_power, # output from power_interaction()
                  power_target = .8,        # the power we want to achieve 
                  x = "N" # x variable
                  )

InteractionPoweR

It would be consistent with my expectations. The power of my model would be 0.324. May you agree with my hypothesis? It was what you were trying to tell me?

What still I can not understand is how to interpret the simulation values:

model<-lm(y~scale(x1)*scale(as.numeric(x2)), data=DF)
summary(model)

Call:
lm(formula = y ~ scale(x1) * scale(as.numeric(x2)), data = DF)

Residuals:
    Min      1Q  Median      3Q     Max 
-20.411  -3.316  -3.086   4.932  36.799 

Coefficients:
                                Estimate Std. Error t value             Pr(>|t|)    
(Intercept)                       97.142      1.859  52.253 < 0.0000000000000002 ***
scale(x1)                         -2.704      1.887  -1.433              0.16035    
scale(as.numeric(x2))              6.115      1.883   3.248              0.00252 ** 
scale(x1):scale(as.numeric(x2))    2.862      1.911   1.498              0.14279    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.67 on 36 degrees of freedom
Multiple R-squared:  0.3137,    Adjusted R-squared:  0.2565 
F-statistic: 5.486 on 3 and 36 DF,  p-value: 0.003292

set.seed(9)
> example_data<-generate_interaction(
  N = 40,                   # sample size
  r.x1x2.y = r.int,           # interaction effect to test (correlation between x1*x2 and y)
  r.x1.y = r.cov,             # correlation between x1 and y
  r.x2.y = r.tr,            # correlation between x2 and y
  r.x1.x2 =r.xx             # correlation between x1 and x2
 )

> test_interaction(data = example_data, # simulated data
                  q = 2                # number of simple slopes
                  )
$linear.model
       Estimate Std. Error   t value    Pr(>|t|)
x1   -0.2062282  0.1264292 -1.631175 0.111572436
x2    0.4191550  0.1242725  3.372870 0.001790809
x1x2  0.4206467  0.1025575  4.101569 0.000223901

Now the x1 and x2 coefficients are consistent with the model, while it seems to me that the interaction still is not.

Not quite. Correlations are always standardized. Whether or not you standardize prior to computing the correlation has no effect. What you are seeing is that interaction variables change depending on whether or not variables are mean-centered first. This is sometimes referred to as the question of essential vs. non-essential multi-collinearity (see https://doi.org/10.3758/s13428-015-0624-x.

Note that these correlations differ:

 DF$x2  =as.numeric(DF$x2)

 int1  = scale(DF$x1,center = F)* scale(DF$x2,center = F)
 int2  = scale(DF$x1,center = T)* scale(DF$x2,center = T)

 cor(DF$y,int1)
          [,1]
[1,] 0.3413899
 cor(DF$y,int2)
          [,1]
[1,] 0.2199223

But the metrics of overall model fit do not differ (F, R2, residual standard error are the same). What changes is the individual model coefficients.

> summary(lm(y ~ x1 + x2 + int1,data = DF))

Call:
lm(formula = y ~ x1 + x2 + int1, data = DF)

Residuals:
    Min      1Q  Median      3Q     Max 
-20.411  -3.316  -3.086   4.932  36.799 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  140.517     33.076   4.248 0.000145 ***
x1           -10.556      5.522  -1.912 0.063918 .  
x2           -19.004     21.047  -0.903 0.372582    
int1          51.209     34.180   1.498 0.142791    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.67 on 36 degrees of freedom
Multiple R-squared:  0.3137,    Adjusted R-squared:  0.2565 
F-statistic: 5.486 on 3 and 36 DF,  p-value: 0.003292

> summary(lm(y ~ x1 + x2 + int2,data = DF))

Call:
lm(formula = y ~ x1 + x2 + int2, data = DF)

Residuals:
    Min      1Q  Median      3Q     Max 
-20.411  -3.316  -3.086   4.932  36.799 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   93.896     12.477   7.525  6.8e-09 ***
x1            -2.553      1.781  -1.433  0.16035    
x2            12.077      3.718   3.248  0.00252 ** 
int2           2.862      1.911   1.498  0.14279    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.67 on 36 degrees of freedom
Multiple R-squared:  0.3137,    Adjusted R-squared:  0.2565 
F-statistic: 5.486 on 3 and 36 DF,  p-value: 0.003292

I do not understand your point. Standardizing variables means both mean-centering and scaling. So, speaking about standardizing surely involves centering too. I perfectly agree about the importance of centering and when I speak about "standardizing" I mean centering too. I have not thought about it enough, but suspect a reason exists why the scaling is irrelevant in this case.

> r.int<-cor(scale(x1,scale=FALSE)*scale(as.numeric(x2),scale=FALSE), y);r.int
          [,1]
[1,] 0.2199223
> r.int<-cor(scale(x1,scale=TRUE)*scale(as.numeric(x2),scale=FALSE), y);r.int
          [,1]
[1,] 0.2199223
> r.int<-cor(scale(x1,scale=TRUE)*scale(as.numeric(x2),scale=TRUE), y);r.int
          [,1]
[1,] 0.2199223

Saying that the solution is only centering both the variables before calculating correlations would find you agreeing?

I think you are following. Correlations scale the covariance based on the SD, so whether or not you re-scale first does not matter. Correlations ignore the mean of variables, but the mean matters when you compute an interaction term. Read the link I included previously.

Thank you, the paper you suggested is very interesting and helpful. As I told you earlier, I almost never have to center and standardize variables in my research field, so I am new to it: it will take me a while to fully assimilate its content.

For now, I got your point: it is not the correlations in general to be impacted by standardization/centering, but only that of the interaction term, which will only depend on the centering.

Am I wrong in saying that the correlation between the interaction term and the response to be passed to your software (as standardized effect) necessarily requires the previous centering of both the variables involved in the interaction?

And how can I explain the seemingly unusual result of the generated data (a p-value of 0.000223901 obtained from a test having a power of 0.324)?

You are correct in that.

It seems you are referring to the results of a single simulation, which you provided above. If power is 0.324, then 32.4% of simulations will have p<0.05. Remember that the p-value itself is unrelated to power.

I really appreciate your help: you are very kind. Mainly, I appreciate the way you choose of taking me to the answer, avoiding an... incomplete or confusing understanding on my part, so assuring a correct use of your package.

Thank you again for the paper: interactions are a ubiquitous issue, so it surely will be fruitful.

Paolo

dbaranger / InteractionPoweR

Interaction term power in a model with a binary factor and a continuous covariate #7