JamesYang007 / ghostbasil

GhostKnockoff + BASIL
2 stars 1 forks source link

GhostBasil very slow for p=1140 problem #33

Closed biona001 closed 1 year ago

biona001 commented 1 year ago

Hi James,

Here is an example where GhostBasil runs slowly. This region has 190 variables, and I generate 5 knockoff copies, so overall A is 1140*1140. On sherlock with 12 cores, it takes ~200 seconds to converge.

Could you check if the output below matches your expectation? Hopefully I didn't misuse your package in some way.

library(Matrix)
library(ghostbasil)

# read data
Ci <- as.matrix(read.table('Ci.txt'))
S <- as.matrix(read.table('S.txt'))
r <- scan('r.txt')
lambdas <- scan('lambdas.txt')

# form ghostbasil inputs
S <- BlockMatrix(list(S))            # dim(S) = 190 by 190
A <- BlockGroupGhostMatrix(Ci, S, 6) # dim(A) = 1140 by 1140

# run ghostbasil
result <- ghostbasil(A, r, delta.strong.size=500, user.lambdas=lambdas,
    max.strong.size = nrow(A), n.threads=12, use.strong.rule=F)

Some observations:

biona001 commented 1 year ago

A few more observations

JamesYang007 commented 1 year ago

Damn I'm inclined to believe there's a 🐛. Thanks for finding this example!! I'll take a look asap. I probably won't be able to get to it till next week though.

JamesYang007 commented 1 year ago

Ok started looking into this! I don't think it's a bug anymore. Here are some of my findings:

I suspect that something's weird in the construction of Ci or S because the group-knockoff construction should ensure that the knockoff matrix A is PSD at the very least. When you solved for S, were you assuming only 1 knockoff or something?

biona001 commented 1 year ago

Hi James, thanks for taking a look. A definitely should be PSD, so I must have messed up somewhere, let's see...

biona001 commented 1 year ago

After reversing lambda and making sure A is PSD, the same problem now runs <1 second. Thanks for the insights, and sorry for the troubles.