Implement GLRT for multivariate B=0, test agreement of analytic and torch

jenniferbrennan commented 3 years ago

H0 = "all beta are zero". Should be chiSq with d degrees of freedom

jenniferbrennan commented 3 years ago

What I implemented:

Analytic: Compute the chiSq test for the H0 "all coefficients are zero" (gives you a p-value)
Empirical: Use the glrtTorchCis function to get 95% confidence intervals on all coefficients. We say H0 "all coefficients are zero" is rejected if any confidence interval does not contain 0.

Plots:

Plots of the form d{}Beta{}.png are generated from GLRT_analytic_test.ipynb. They represent tests with a d-dimensional coefficient vector, with the true coefficient vector having Beta in all components. For each coefficient, we plot the p-value (from the analytic version) on the x-axis and the confidence intervals (from the empirical version) on the y-axis. The confidence interval is colored green if the empirical & analytic tests agree on whether to reject H0, and red if they disagree.
Plots of the form True{}-dim{}.png are generated from GLRT Coverage.ipynb. They plot the same thing as above, but report the coverage of the empirical confidence intervals over 1000 draws. TrueZero plots have beta=0, and TrueNonzero have beta~N(0,1). Note that the CI's for the latter case are very small, so they don't show up in the current plots. :p

Observations:

As d increases, the empirical method's confidence intervals are too large. This can be seen in two ways:
1. The coverage is 93% for d=2, but 99% for d=5 and d=10
1. The disagreements between the analytic and empirical CI's when d=1 are errors of the form "the empirical CI's are too small when p = 0.05+epsilon", and as d increases they become of the form "the empirical CI's are too large when p < 0.05"

Not sure what to do with this information, although it's probably worthwhile to re-run these notebooks with the more exhaustive search procedure.

jenniferbrennan commented 3 years ago

Is this a multiple testing problem? The argument there would be that the "empirical" method is performing d tests: "Is coefficient 1 zero?" "Is coefficient 2 zero?" etc etc, and that these tests are independent, so the probability of rejecting goes to 1 as the number of coefficients increases.

How would we fix this? We could do a multiple testing correction, but honestly, I'm not sure that would then be the same test as the multivariate GLRT (in terms of power).

gabeerion / explanation-uncertainty

Implement GLRT for multivariate B=0, test agreement of analytic and torch #20