JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
169 stars 54 forks source link

using cores for maxFDR #120

Open anujgoel1 opened 3 years ago

anujgoel1 commented 3 years ago

Hi, I am running MTAG on a 64bit 15 cores AMD processor, Centos 8 OS with Python 2.7.17. I have switched on --cores 3 for the maxFDR calculation but it seems that it is only using 1 core. Could you please advise how I can make use of multi-threading to speed up the calculations. Many thanks. Best wishes, Anuj.

JonJala commented 3 years ago

Hi, Anuj -

Just wanted to confirm that you see "Performing grid search using 3 cores" in the log file?

The joblib module handles the job/core allocation, and the number of cores specified is just handed off to joblib as the number of jobs. You could run "import joblib; print joblib.cpu_count()" to doublecheck that joblib itself realizes there is more than one core on the system.

Also, what are you seeing that makes it clear it's only using one core?

anujgoel1 commented 3 years ago

Hi, Thanks for your prompt reply and apologies for my hasty issue creation. I can confirm that "import joblib; print joblib.cpu_count()" gives 16 and can confirm that --cores work fine when T=4. I have been looking at "top" to see cpu% and additional rows for the same jobid/time increments to make sure the multi-threading is working. So for T==4, the log file does say "Performing grid search using 3 cores".

I am having trouble when T==5 where the log is stuck at:

2020/12/09/05:41:11 PM Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time...
2020/12/09/05:41:11 PM T=5

I understand that as T>4, the computational time is too much but I thought perhaps giving it more cores might speed it up? So it seems that the step before the grid search uses just 1 core.

I am guessing it is something you have looked into and not feasible? Many thanks again. Best wishes, Anuj.

JonJala commented 3 years ago

Hi, Anuj -

Hmm, from what I understand, the steps before the grid search should generally not be too time-consuming (but apparently that's not the case, at least here). What FDR options / inputs are you using?

anujgoel1 commented 3 years ago

I am doing the analysis in 2 steps.

./mtag.py --sumstats f1,f2,f3,f4,f5 --no_chr_data --out ./run_5
./mtag.py --skip_mtag --cores 3 --out ./run_5

Maybe I should explore n_approx flag?

paturley commented 3 years ago

Hi Anuj,

I'd recommend trying to use the n_appox flag. Especially if your summary statistics have approximately the same sample size across all SNPs within each trait.

On Thu, Dec 10, 2020 at 2:19 PM Anuj Goel notifications@github.com wrote:

I am doing the analysis in 2 steps.

./mtag.py --sumstats f1,f2,f3,f4,f5 --no_chr_data --out ./run_5 ./mtag.py --skip_mtag --cores 3 --out ./run_5

Maybe I should explore n_approx flag?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/120#issuecomment-742738437, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5KVX34TMAS2Y7MA3JDSUENNZANCNFSM4UVE374A .

anujgoel1 commented 3 years ago

Thanks a lot for your help. It is still taking time to perform grid search. Kind regards, Anuj.

paturley commented 3 years ago

I'm sure it is. With 5 phenotypes, it's searching along 2^5 axes of variation and it's likely checking 100 points for each axis. So it is evaluating the FDR at about 100^2^5 points. That's a huge space. You could reduce the number of points per axis or restrict the space in other ways if you want. Or you could possibly report the maxFDR for each subset of 4 if you've gotten that to run fast enough. You'd just have to justify whatever you do to reviewers.

Good luck!

On Mon, Dec 14, 2020 at 7:54 AM Anuj Goel notifications@github.com wrote:

Thanks a lot for your help. It is still taking time to perform grid search. Kind regards, Anuj.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/120#issuecomment-744419621, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5O4Q7MZNJGOJ7XJVITSUYDIHANCNFSM4UVE374A .