Closed samzl1 closed 1 year ago
Hey there, we're happy to hear you are using our package! I am having my colleague look into this. Here's the general pseudo-code for power calculations (for a fixed additive treatment effect tau):
We'll let you know shortly on an answer, and in the meantime please confirm that you are doing this.
Thanks for the quick response. I feel the clique_test() is using the outset fixed biclique (which is bd in my code), rather than computing a new biclique each time. The procedure you described is what I'm doing right now.
Just realized that after I wrote the response. Actually, I recall we updated the clique_test function like this precisely for these calculations and to separate out the decomposition from the test. We’re looking into your specific question and will get back to you.
Hi David,
Any update on the power issue?
Hi samzl1,
Thanks for your detailed investigation on the package. This issue, unfortunately, is again due to how we calculate the p-value (or we can say it's due to the sample size is not large enough). Currently the two-sided p-value is calculated by firstly calculating the one-sided version as
M = sum(abs(tvals - tobs) < 1e-14)
p_val = (sum(tvals > tobs) + runif(1) * (1 + M))/(length(tvals) + 1)
and return 2*min(p_val, 1-p_val)
where tobs
is the observed test statistic and tvals
is a list of randomized test stat.
In the example when the magnitude of the effect is quite large, the observed tobs
is quite large and larger than all tvals
, which should be a clear signal to reject the null. But now the p_val
becomes U/(n+1) where U~Unif(0,1) and n is the length of tvals
, which in the example is about 30. Let's take it to be U/n. The decision is based on whether 2*min{U/n, 1-U/n} < alpha=0.05.
But here comes the problem. Taking n=30, Prob{2 min{U/n, 1-U/n} < alpha=0.05} = Prob{U < 0.025 n} = 0.75. That is, on average we can only reject 75% of time, which is why the power in your plot is capped at around 0.8. The issue here is that n is not large enough so that the randomness in the p-value makes Prob{U < 0.025*n} < 1.
This is in general a drawback for such randomized p-values. But in this example a simple remedy is to take the test stat to be
Tstat = function(y, z, is_focal) {
exposures = rep(0, N)
for (unit in 1:N) {
exposures[unit] = exposure_i(z, unit)[1]
}
stopifnot("all focals have same exposures" = (length(unique(exposures[is_focal]))>1) )
abs(mean(y[is_focal & (exposures == 1)]) - mean(y[is_focal & (exposures == 0)]))
}
that is, we look at the absolute differences in means. In this case a large Tstat gives evidence of rejection as in many papers, and now we can make decision based on the one-sided p-value p_val = (sum(tvals > tobs) + runif(1) * (1 + M))/(length(tvals) + 1)
. In the code it simply suffices to set testout = clique_test(Y, Z, Tstat, bd, two_sided=F)
. And this solves the problem:
There's another way to solve it by using a non-random p-value. such as pval=2*min(p, 1-p)
where p=mean(tvals > tobs)
. This amounts to modify the clique_test
function as
clique_test1 = function (Y, Z, teststat, biclique_decom, alpha = 0.05, two_sided = T)
{
stopifnot(`teststat should be specified as a function` = is.function(teststat))
method = biclique_decom$method
MNE = biclique_decom$MNE
Zobs_id = 1
Zobs_where = lapply(MNE, function(b) {
sum(colnames(b$assignments) == Zobs_id) > 0
})
Zobs_where = which(Zobs_where == TRUE)
focal_clique = MNE[[Zobs_where]]
focal_unit = focal_clique$units
focal_Zm = focal_clique$assignments
num_units = dim(focal_Zm)[1]
focal_unit_indicator = 1:num_units %in% focal_unit
test_stats = rep(0, dim(focal_Zm)[2])
for (zid in 1:length(test_stats)) {
Z_focal = focal_Zm[, zid]
test_stats[zid] = teststat(Y, Z_focal, focal_unit_indicator)
}
tobs = test_stats[which(colnames(focal_Zm) == Zobs_id)]
tvals = test_stats[which(colnames(focal_Zm) != Zobs_id)]
if (anyNA(test_stats)) {
warning("The test statistic or its randomization distribution contains NA. It may be due to poor\n selection of focals. Consider rerun the biclique decomposition.")
pval = 3
retlist = list(p.value = pval, statistic = tobs, statistic.dist = tvals,
method = method, conditional.clique = focal_clique)
return(retlist)
}
if (sum(abs(tvals - tobs))/length(test_stats) < 1e-14) {
warning("Randomization distribution is degenerate. p.value is not available. Consider changing the test statistic.\n See ... for further discussions.")
pval = 2
retlist = list(p.value = pval, statistic = tobs, statistic.dist = tvals,
method = method, conditional.clique = focal_clique)
return(retlist)
}
stopifnot(`two_sided should be either TRUE or FALSE` = is.logical(two_sided))
avg_great = mean(tvals > tobs)
if (two_sided) {
pval = 2*min(avg_great, 1-avg_great)
}
else {
pval = avg_great
}
retlist = list(p.value = pval, statistic = tobs, statistic.dist = tvals,
method = method, conditional.clique = focal_clique)
return(retlist)
}
Now you can keep the original test stat (i.e., no abs on it) and use testout = clique_test1(Y, Z, Tstat, bd)
, which can still give the correct power as in the below. But as you see earlier, such a non-randomized p-value performs poorly in the size simulation.
We suggest making Tstat such that a large Tstat indicates rejection, and using the one-sided p-value as a default choice.
Hi szhuang1,
This is very helpful; thank you for the detailed answer!
Hi David,
Thank you for solving the size issue! Sorry for the back and forth. I really want to implement your package in my paper; that's why I dig so deep into the details...
This time is regarding the power of the test. I'm trying to replicate the results from section 6.2 based on the cluster interference code you have. I used N=300 and K=20, which is the same as the left-hand side plot in the paper. I generated 5,000 different assignments and construct the null exposure graph for the biclique method with mina = 30, as in the example code. For each \tau, I generated 2,000 data sets from the DGP with out_bassefeller function and tau_equal = T; \tau is varied among 50 equally spaced values from 0 to 1. The code I used to produce the power plot of the test is attached.
As you can see, the power increased slowly and was capped at 0.8, which looks different from the graph in the paper. Is there any setting I should be aware of for a better power?
rm(list=ls()) library(BicliqueRT) library(doParallel) library(doRNG)
set.seed(1)
N = 300 # total number of units K = 20 # 20, 30, 75 total number of households, i.e., number of clusters
num_randomizations = 5e3 sim_randomizations = 2e3 tau_length = 50 alpha=0.05
housestruct = out_house_structure(N, K, T) treateffect = seq(0, 1, length.out = tau_length)
The design function:
design_fn = function(){ treatment_list = out_treat_household(housestruct, K1 = K/2) # one unit from half of the households would be treated. treatment = out_Z_household(N, K, treatment_list, housestruct) return(treatment[,'treat']) }
The exposure function: exposure for each unit i is zi + \sum{j \in [i]} z_j where [i] represents the cluster i is in.
exposure_i = function(z, i) {
find the household that i is in
house_ind = cumsum(housestruct) which_house_i = which.min(house_ind < i)
find lower and upper index of [i] in z
if (which_house_i == 1) {lower_ind = 1} else {lower_ind = house_ind[which_house_i-1] + 1} upper_ind = house_ind[which_house_i]
calculate exposure
exposure_z = z[i] + sum(z[lower_ind:upper_ind]) exposure_z }
The null
null_equiv = function(exposure_z1, exposure_z2) { ((exposure_z1 == 1) | (exposure_z1 == 0)) & ((exposure_z2 == 1) | (exposure_z2 == 0)) }
Generate a treatment realization and outcome
Z = design_fn() # randomly get one realization
Generate exposure under the realized Z
Z_exposure = rep(0, N); for (i in 1:N) { Z_exposure[i] = exposure_i(Z, i) }
Do biclique decomposition on the null exposure graph
H0 = list(design_fn=design_fn, exposure_i=exposure_i, null_equiv=null_equiv) bd = biclique.decompose(Z, H0, controls= list(method="greedy", mina=30, num_randomizations = 5e3))
Define a test statistic, here we use the contrast of mean between units with exposure 1 (untreated in treated cluster) and exposure 0 (untreated in untreated cluster)
Tstat = function(y, z, is_focal) { exposures = rep(0, N) for (unit in 1:N) { exposures[unit] = exposure_i(z, unit)[1] } stopifnot("all focals have same exposures" = (length(unique(exposures[is_focal]))>1) ) mean(y[is_focal & (exposures == 1)]) - mean(y[is_focal & (exposures == 0)]) }
powerbi<-function(Y){
tau=0 is the tau in the null: Y_i(b) = Y_i(a) + tau for all i.
Then run the test
testout = clique_test(Y, Z, Tstat, bd)
pval=testout$p.value # p-value of the test return(pval)
}
test_bi<-function(tt){
simulating an outcome vector
sim=lapply(tt, function(s) out_bassefeller(N, K, Z_exposure, tau_main = s, housestruct = housestruct, tau_equal = T)$Yobs) #should be the right way to construct Y?
Pvalbi=lapply(sim, powerbi)
return(unlist(Pvalbi)) }
nCores <- 6 # number of CPUs for parallelization cl<-makeCluster(nCores) registerDoParallel(cl) results.par.bi <- foreach(i=1:sim_randomizations,.packages="BicliqueRT", .combine=rbind) %dorng% {
results.par.bi <- test_bi(treateffect)
}
pp_bi = colMeans(results.par.bi<alpha)
stopCluster(cl)
visualization
pp=data.frame(Power=pp_bi, effect_size=treateffect)
library(ggplot2) graph<- ggplot(pp, aes(y=Power, x = effect_size) ) + geom_point()+ geom_line()+ labs(x = "Magnitude of the effect", y="Power")+ theme_minimal() + ggtitle(bquote(paste("Power analysis when number of households is ", .(K), " and number of people is ", .(N))))
print(graph)