Closed biona001 closed 1 year ago
A few more observations
n.threads=1
delta.strong.size=10
and the same problem is still running after one hourDamn I'm inclined to believe there's a 🐛. Thanks for finding this example!! I'll take a look asap. I probably won't be able to get to it till next week though.
Ok started looking into this! I don't think it's a bug anymore. Here are some of my findings:
S
and Ci
are not symmetric (close to symmetric though). I don't know if this matters so much, but for numerical stability, it might be worth symmetrizing: S = (S + t(S)) / 2
(similarly for Ci
).A
is not PSD!
A.dense <- A$to_dense()
A.dense.eig = eigen(A.dense, symmetric=T)
tail(A.dense.eig$values, 10)
[1] -0.01602446 -0.02566999 -0.02675701 -0.02717474 -0.02850788 -0.02912589
[7] -0.03217679 -0.03340235 -0.03437624 -0.03684428
I get very similar results if I symmetrize A.dense
directly.
These eigenvalues are significantly negative. I did not change your code snippet - just added these three lines before calling ghostbasil
.
A
such that I truncate the negative eigenvalues to 0 and then I add 1e-10
to the diagonal to make it slightly PD:
dd = sapply(A.dense.eig$values, function(x) max(x, 0)) + 1e-10
A.dense.trunc = A.dense.eig$vectors %*% (dd * t(A.dense.eig$vectors))
Now, running ghostbasil
gives good result:
result.dense <- ghostbasil(A.dense.trunc, r, alpha=1, delta.strong.size=500, user.lambdas=lambdas,
max.strong.size = nrow(A.dense.trunc), n.threads=1, use.strong.rule=T)
For example, here's the (relative) rsq
:
[1] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[6] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[11] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[16] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[21] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[26] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[31] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[36] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[41] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[46] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[51] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[56] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[61] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[66] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[71] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[76] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[81] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[86] 0.0003982219 0.0044517538 0.0110897322 0.0204761087 0.0345857061
[91] 0.0526677093 0.0725795312 0.0970495241 0.1328570606 0.1861651392
[96] 0.2550997130 0.3586244934 0.5188728742 0.7549665432 1.0000000000
I suspect that something's weird in the construction of Ci
or S
because the group-knockoff construction should ensure that the knockoff matrix A
is PSD at the very least. When you solved for S
, were you assuming only 1 knockoff or something?
Hi James, thanks for taking a look. A
definitely should be PSD, so I must have messed up somewhere, let's see...
After reversing lambda and making sure A
is PSD, the same problem now runs <1 second. Thanks for the insights, and sorry for the troubles.
Hi James,
Here is an example where GhostBasil runs slowly. This region has 190 variables, and I generate 5 knockoff copies, so overall
A
is 1140*1140. On sherlock with 12 cores, it takes ~200 seconds to converge.Could you check if the output below matches your expectation? Hopefully I didn't misuse your package in some way.
Some observations:
result$error
is""
)result$rsqs
returns a vector ofNaN