dpuelz / BicliqueRT

Randomization Tests of Causal Effects under General Interference
https://arxiv.org/pdf/1910.10862.pdf
6 stars 4 forks source link

Size and power of spatial intereference #23

Closed samzl1 closed 1 year ago

samzl1 commented 1 year ago

Hi David, Hi szhuang1,

Thank you so much for your help last year with the cluster interference example. It works perfectly right now!

However, I'm playing around with a similar spatial interference example and have the same kind of size and power issue that I asked you before. The radius I'm using is 0.04 rather than 0.02 in your example, and here is the plot. spatial It seems that both size and power are a bit off, and I noticed that the power is not monotone, which is a bit weird.

I attached the code I'm using below, and I highly appreciate any help you can provide.

Best, Liang

install.packages("devtools") library(devtools) install_github("dpuelz/BicliqueRT")

rm(list=ls()) library(BicliqueRT) set.seed(1)

N = 500 # number of units thenetwork = out_example_network(N) D = thenetwork$D

Z = rbinom(N, 1, prob=0.2)

num_randomizations = 5000 radius = 0.04

design_fn = function() { rbinom(N, 1, prob=0.2) }

Gr = (D<radius) 1; diag(Gr) = 0 exposure_i = function(z, i) { c(as.numeric(sum(Gr[i,]z) > 0), z[i]) }

null_hypothesis = list(c(0,0), c(1,0)) null_equiv = function(exposure_z1, exposure_z2) { (list(exposure_z1) %in% null_hypothesis) & (list(exposure_z2) %in% null_hypothesis) }

H0 = list(design_fn=design_fn, exposure_i=exposure_i, null_equiv=null_equiv) bd = biclique.decompose(Z, H0, controls= list(method="greedy", mina=20, num_randomizations = 2e3)) m = bd$MNE # this gives the biclique decomposition

Tstat = function(y, z, is_focal) { exposures = rep(0, N) for (unit in 1:N) { exposures[unit] = exposure_i(z, unit)[1] } stopifnot("all focals have same exposures" = (length(unique(exposures[is_focal]))>1) ) abs(mean(y[is_focal & (exposures == 1)]) - mean(y[is_focal & (exposures == 0)])) }

powerbi<-function(Y){

testout = clique_test(Y, Z, Tstat, bd)

pval=testout$p.value # p-value of the test return(pval)

}

test_bi<-function(tt){

sim=lapply(tt, function(s) rnorm(N)+s as.numeric(Gr%%Z > 0)) Pvalbi=lapply(sim, powerbi)

return(unlist(Pvalbi)) }

alpha = 0.05 sim_randomizations = 2e3 tau_length = 50

treateffect = seq(0, 1, length.out = tau_length)

library(doParallel) library(doRNG)

nCores <- 6 # number of CPUs for parallelization cl<-makeCluster(nCores) registerDoParallel(cl) results.par.bi <- foreach(i=1:sim_randomizations,.packages="BicliqueRT", .combine=rbind) %dorng% {

results.par.bi <- test_bi(treateffect)

}

pp_bi = colMeans(results.par.bi<alpha)

stopCluster(cl)

visualization

pp=data.frame(Power=pp_bi, effect_size=treateffect)

library(ggplot2) graph<- ggplot(pp, aes(y=Power, x = effect_size) ) + geom_point()+ geom_line()+ labs(x = "Magnitude of the effect", y="Power")+ theme_minimal() + geom_hline( yintercept = alpha, linetype = 2, color='red' )+ ggtitle(bquote(paste("Power analysis when spillover threshold is ", .(radius))))

print(graph)

samzl1 commented 1 year ago

Feel free to let me know if you have any questions about the simulation code above or if you have a different result using the code. Any help would be highly appreciated!

dpuelz commented 1 year ago

Hi @samzl1, apologies for the delay. We are working on this.

samzl1 commented 1 year ago

Hi @dpuelz, any updates?

dpuelz commented 1 year ago

Hey! We are looking into it this week. Thanks for your patience.

On Mar 11, 2023, at 12:07 PM, samzl1 @.***> wrote:

Hi @dpuelz https://github.com/dpuelz, any updates?

— Reply to this email directly, view it on GitHub https://github.com/dpuelz/BicliqueRT/issues/23#issuecomment-1464968825, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACL56JA35KNJ5FZKGEUDXPTW3S5OXANCNFSM6AAAAAAUV53ZD4. You are receiving this because you were mentioned.

szhuang1 commented 1 year ago

Hi @samzl1, sorry for the delay. Here's my thoughts on the issue.

In terms of the size, from my side when I increase the number of simulations to 1e4, the rejection rate at effect=0 is 0.054, which I think is fine and I believe it will be closer to 0.05 when the number of simulations is even larger.

In terms of the power, I think there might be two reasons regarding the biclique. The first one is about the size of the biclique. In the clustered interference example before (N=300, K=20), usually we can get a biclique of size (# focal units x # focal assignments) at least 100 x 30. But here when specifying mina=20, num_randomizations = 2e3 the biclique is of size 20 x 20 (note that even if you set num_randomizations=5e3 at the beginning, it was not passed to the bd in the code above). As mentioned in Sec. 5 and Fig. 4 in the paper, the size of the biclique is important for the power of the test. Usually the power will not be satisfactory if we get a relatively small biclique as in this case.

But I want to mention that it is not always true that a larger biclique gives a more powerful test. I increase num_randomizations and mina and get larger bicliques, but these bicliques give over-conservative results where the power increases even slower than the result you have. This is perhaps due to the proportions of the two exposures in the biclique, which is the second reason I'd like to mention. In this example an ideal biclique (that can probably give a more powerful test) should have roughly the same number of (exposures == 1) and (exposures == 0) focal units as in the Tstat function. But after running decompositions for many times and different specifications of the decomposition algo, I find that focal assignments in the biclique, including the biclique from your code, usually give 70% to 80% (exposures == 1) focal units, which is not desirable. On the contrary, in the clustered interference example (N=300, K=20) the proportion of (exposures == 1) is close to 50%. Below are some simple codes from which you can see the proportions of the (exposures == 1) units in the biclique bd.

# examining proportions of two exposures
exposures_proportions = function(y, z, is_focal) {
  exposures = rep(0, N)
  for (unit in 1:N) {
    exposures[unit] = exposure_i(z, unit)[1]
  }
  stopifnot("all focals have same exposures" = (length(unique(exposures[is_focal]))>1) )
  sum(is_focal & (exposures == 1)) / sum(is_focal)
}
testout_expprop = clique_test(rnorm(N), Z, exposures_proportions, bd)
out_exposures_proportions = c(
  testout_expprop$statistic,
  testout_expprop$statistic.dist
)

Unfortunately these two issues, and in general the power of the test may depend on various factors per application/case, and in some cases like this the power of the test increases slowly wrt to the magnitude of the effect. From my side when the magnitude is 3 the test can reject 100% of time. I'm afraid this is sometimes the price to pay to have a general method for many different spillover cases like ours. I understand this does not solve the problem you have, but for different applications/experimental design there might exist case-specific methods that can give a more powerful test, and you can perform similar power simulations to decide which method to use.

Hope these can help and feel free to comment if anything is unclear or you have further questions.

samzl1 commented 1 year ago

Thank you so much for helping me play around with the power analysis! I learned a lot about how power is affected in the testing framework.

samzl1 commented 1 year ago

@dpuelz @szhuang1 One more question regarding the power in figure 3 (cluster interference simulation in section 6). I see that the biclique method performs worse as the number of households increases, so we need the design-assisted biclique tests. However, I found that the power of the design-assisted biclique test has better power when the number of households increases from 20 to 75. Do you know if it is necessary for the case? And is it only due to the increase in the number of focal units?

I'm trying to understand the power property of the test better, so I highly appreciate any help you can provide.

dpuelz commented 1 year ago

Hi @samzl1, I'm not sure if it will necessarily be the case. One key metric will be to look at the average number of focals per cluster. My hunch is that number will be larger for K=20 versus K=75, but I am not sure. Did you run a simulation?

samzl1 commented 1 year ago

@dpuelz I agree the number of focals per cluster would be larger for K=20 vs K=75, so it seems the power would be larger for the K=20 case. However, the design-assisted method favored K=75 over K=20. I haven't run the simulation yet. Thanks again for your feedback.