JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
171 stars 54 forks source link

maxFDR calculations run forever #157

Open lubertorubior opened 2 years ago

lubertorubior commented 2 years ago

Hello,

Thank you for creating mtag! We are using your software with great results. However, we are stuck in the maxFDR calculations part which runs endlessly after giving this message:

Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time..

We are running this on a dual Intel(R) Xeon(R) Platinum 8358 workstation with 2TB of RAM so memory should not be a problem here. Could this be anything related to the number of traits (T=11) which results in a very high number of operations?

How could we perform these calculations in a proper manner?

Thank you!

Best, Luis.

paturley commented 2 years ago

Hi Luis,

Yes, it's because you are trying to calculate maxFDR with 11 traits. MaxFDR calculates the FDR for every feasible mixture distribution corresponding to a set of fixed gridpoints. I forget how tight the default resolution is, but I think it checks everything in 0.1 unit intervals for each trait, meaning that it needs to evaluate the FDR ~11^11 = 285 billion times. It's not a high memory process, but it is a lot of operations.

When I designed maxFDR, I never really anticipated people using it for more than a few traits at a time, so while MTAG is very scalable maxFDR is not.

I can imagine a couple options if you have this many data sets:

1) Calculate the maxFDR pairwise for each pair of traits and take the max of that 2) Reduce the number of grid points to be something like {.1, .5, .9} for each trait and understand that your maxFDR estimate will be biased downward 3) Just omit the maxFDR analyses and hope your reviewers don't mind. If you have a replication sample, that's even better than maxFDR anyways

There may be other creative options too, but I can't think of them off the top of my head.

Patrick

Sorry.

On Wed, May 4, 2022 at 10:46 AM lubertorubior @.***> wrote:

Hello,

Thank you for creating mtag! We are using your software with great results. However, we are stuck in the maxFDR calculations part which runs endlessly after giving this message:

Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time..

We are running this on a dual Intel(R) Xeon(R) Platinum 8358 workstation with 2TB of RAM so memory should not be a problem here. Could this be anything related to the number of traits (T=11) which results in a very high number of operations?

How could we perform these calculations in a proper manner?

Thank you!

Best, Luis.

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5OHAQDP4RG7M3ZVGZLVIKEUTANCNFSM5VCG4K4Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>