Wrong P-values and simulated estimates in multi-arm trials

When conducting RI for multi arm Trials, the p-values of all but the first treatment arm are false. As can be seen in the histogram of simulations, there seems to be a major issue with the way ri2 simulates statistics for the treatment arms of order 3. The simulated ATEs should be centred around 0. As they are not, the p-values retrieved are wrong.

# reprex
library(reprex)
#> Warning: package 'reprex' was built under R version 4.3.3

# three arm experiment
library(ri2)
#> Loading required package: randomizr
#> Loading required package: estimatr
library(DeclareDesign)
#> Loading required package: fabricatr

N = 100
set.seed(1)

declaration_test <- declare_ra(
  N = N,
  num_arms = 4)

set.seed(123)
real_T <- conduct_ra(declaration = declaration_test)
ATE_t2 = -.5
ATE_t3 = .6
ATE_t4 = 1

Y = -.2 + ATE_t2 * (real_T == "T2") + ATE_t3 * (real_T == "T3") + 
  ATE_t4 * (real_T == "T4") + rnorm(n = N, sd = .2)
data_test = data.frame(Y, real_T)
result = conduct_ri(formula = Y ~ real_T,
                    assignment = 'real_T',
                    declaration = declaration_test, 
                    IPW = F, 
                    p = 'two-tailed',
                    data = data_test,
                    sharp_hypothesis = 0)

plot(result)

summary(result, p ='two-tailed')
#>       term   estimate two_tailed_p_value
#> 1 real_TT2 -0.4817978                  0
#> 2 real_TT3  0.5587833                  1
#> 3 real_TT4  0.9834681                  1

acoppock / ri2

Wrong P-values and simulated estimates in multi-arm trials #33