choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
185 stars 87 forks source link

PRSice/2.3.5 slow when computing allele frequency from plink file #281

Closed complexgenome closed 3 years ago

complexgenome commented 3 years ago

Dear developer,

I have used prscise2 earlier for computing PRS on a binary trait. I am currently using PRScise2 for a continuous trait. The number of input iids is ~1,750. The number of SNPs in plink file is ~7M (MAF 1%). The number of SNPs in GWAS/base dataset is 7.8M

module load  PRSice/2.3.5

PRSice_linux  \
--base  CCCE_MHAS_model1_rsq80_clean \
--target plink_selected_iids/CHR#_keep_iids_maf01 \
--nonfounder \
--bar-levels 0.00000005,0.00001,0.001,0.01,0.05,0.1,0.2,0.3,0.4,0.5  \
--no-full --fastscore --pheno PHENOtest_clean.txt \
--pheno-col CCCE --binary-target F \
--cov PHENOtest_clean.txt --cov-col PC1,PC2,PC3 --print-snp \
--seed 2397373689 --thread 8 \
--pvalue p.value --snp SNPID --bp POS \
--chr CHR --stat BETA --A2 Allele1 \
--A1 Allele2 --model add --type bed \
--score avg --clump-kb 250 \
--clump-r2 0.1 --clump-p 1 \
--maf 0.01 --perm 10000 \
--all-score --keep testKEEP.txt \
--out PRS_250KB_quantitative

Data are TOPMED imputed. hg38 base. tool takes hours (8+) and still processed ~8% of allele frequency computing. (Calculating allele frequencies:)

Is there anything to improve? best,

choishingwan commented 3 years ago

Unfortunately, for now that will be the max speed of PRSice. You can do prefiltering with PLINK (which should be more efficient) and provide PRSice with the extraction / exclusion list to speed things up.

Sam

complexgenome commented 3 years ago

I have included/excluded individuals and SNPs interested for in the PLINK files. It was ultra fast if I recall in earlier versions of the prscise2.

choishingwan commented 3 years ago

From the log, it seems like you did not provide PRSice the extraction / exclusion list, I guess that is why it is slow. Also, if you have pre-filter with PLINK, you don't need to re-run --maf as that will just slow things down. Hope this help.

complexgenome commented 3 years ago

Got it; that helped tremendously. I removed --maf flag