JonJala / mama

MIT License
13 stars 4 forks source link

Issue with cross-ancestry LD score computation - very slow #29

Closed bishalth01 closed 1 year ago

bishalth01 commented 1 year ago

Hi,

First of all, thanks for the wonderful tool. We were trying to perform a cross-ancestry meta-analysis using MAMA. However, we came to find out that the cross ancestry LD Score generation process is very slow, even when done individually for each of the 22 chromosomes. Please see the logs attached for chromosome 1.

It's already been around 6 days and still, we have not got the score for the first cross score (AFR-AFR). We are using 4 populations (AFR, AMR, SAS, and EUR). I guess we still need to compute the scores for AMR-AMR, AFR-AMR, AFR-EUR, etc, and many more. At this rate, it should take more than ~30-35 days just for chromosome 1 (2.70K SNPs, 2000 subjects).

Do you have any suggestions to make this process faster? The number of SNPs is only around ~3M and the subject size is around 2000. We are running on a cluster with a large amount of memory, and still, the performance is very slow. If you have any suggestions to make this work faster, we would really appreciate that.

_2022/10/13 08:23:26 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MAMA: Multi-Ancestry Meta-Analysis <> Version: 1.0.0 <> (C) 2020 Grant Goldman, Hui Li, Alicia Martin, Patrick Turley and Raymond Walters <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Software-related correspondence: grantgoldman0@gmail.com or jjala.ssgac@gmail.com <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mama_ldscore.py \ --out ldscore_chr_1 \ --stream-stdout \ --snp-ances snp_ances_file1 \ --ances-path iid_ances_file \ --bfile-merged-path \MERGED.chr1.ALLPOP \ --ld-wind-cm 1.0

2022/10/13 08:23:26 PM Beginning to estimate cross-ancestry LD scores... 2022/10/13 08:23:26 PM Read list of 1999 individuals from MERGED.chr1.ALLPOP.fam 2022/10/13 08:23:26 PM Read list of 270568 SNPs from MERGED.chr1.ALLPOP.bim 2022/10/13 08:23:27 PM There are 0 SNPs in the merged .bim file without identified source of ancestry groups. These variants are dropped in the LD score estimation. 2022/10/13 08:23:27 PM <><><<>><><><><><><><><><><><><><><><> 2022/10/13 08:23:27 PM Read list of 270569 SNPs to include from ldscore_chr_1.snplist 2022/10/13 08:23:28 PM Read list of 270569 SNPs to include from ldscore_chr_1.snplist.AFR 2022/10/13 08:23:29 PM Read list of 270569 SNPs to include from ldscore_chr_1.snplist.AMR 2022/10/13 08:23:30 PM Read list of 270569 SNPs to include from ldscore_chr_1.snplist.EUR 2022/10/13 08:23:31 PM Read list of 270569 SNPs to include from ldscore_chr_1.snplist.SAS 2022/10/13 08:23:31 PM Read list of 662 individuals to include from ldscore_chr_1.indlist.AFR 2022/10/13 08:23:31 PM Read list of 348 individuals to include from ldscore_chr_1.indlist.AMR 2022/10/13 08:23:31 PM Read list of 503 individuals to include from ldscore_chr_1.indlist.EUR 2022/10/13 08:23:31 PM Read list of 490 individuals to include from ldscore_chr1.indlist.SAS 2022/10/13 08:23:31 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2022/10/13 08:23:31 PM Estimating LD Score. 2022/10/13 08:23:31 PM Reading genotypes from MERGED.chr1.ALLPOP.bed for LD estimation based on AFR-AFR 2022/10/13 08:23:43 PM After filtering, 661 individuals remain 2022/10/13 08:23:47 PM After filtering, 268791 SNPs remain 2022/10/13 08:23:48 PM Begin calculating LD scores based on AFR-AFR 2022/10/13 08:24:00 PM After filtering, 661 individuals remain 2022/10/13 08:24:04 PM After filtering, 268791 SNPs remain