choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
187 stars 89 forks source link

PRSice MAF and Filtering Step Slow #360

Open zhanglucas opened 4 months ago

zhanglucas commented 4 months ago

Hi,

I'm trying to run PRSice on UKBB BGEN data, and I've noticed that the "Calculate MAF and perform filtering on target SNPs" step is extremely slow, to the point where I've let it run for nearly a week without this step finishing. I've also noticed that even with the --threads option set it's only using one thread in this step. Is there anything I can do to speed up this step?

Here are my run options: PRSice2 \ --a1 A1 \ --a2 A2 \ --allow-inter \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base /home/ukbiobank/PRS_SumStats_Pheno/rename_HF_bothsex_df_cleaned_reformat_without_NaN.assoc \ --beta \ --binary-target T \ --bp BP \ --chr CHR \ --clump-kb 250kb \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --dose-thres 0.000000 \ --extract /home/lzhang@TENAYA.local/ukbiobank/PRS_SumStats_Pheno/HF_snpextract.txt \ --hard \ --hard-thres 0.100000 \ --id-delim "" \ --interval 5e-05 \ --lower 5e-08 \ --num-auto 22 \ --out HF_All_GBMI_with_HCM \ --pheno /home/ukbiobank/PRS_SumStats_Pheno/HF_reordered_pheno.pheno \ --print-snp \ --pvalue P \ --seed 2781250093 \ --snp SNP \ --stat BETA \ --target /home/ukbiobank/Genotypes/ukb21007_c#_b0_v1.recoded,/home/ukbiobank/Genotypes/ukb21007_c1_b0_v1.recoded.sample \ --thread 23 \ --type bgen \ --ultra \ --upper 0.5

choishingwan commented 4 months ago

Unfortunately my implementation of the bgen format isn’t all that efficient. The best way would be first converting the bgen format to plink Usually the performance difference is not that high

Sam

On Tue, Jul 9, 2024 at 1:02 PM Lucas Zhang @.***> wrote:

Hi,

I'm trying to run PRSice on UKBB BGEN data, and I've noticed that the "Calculate MAF and perform filtering on target SNPs" step is extremely slow, to the point where I've let it run for nearly a week without this step finishing. I've also noticed that even with the --threads option set it's only using one thread in this step. Is there anything I can do to speed up this step?

Here are my run options: PRSice2 --a1 A1 --a2 A2 --allow-inter --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 --base /home/ukbiobank/PRS_SumStats_Pheno/rename_HF_bothsex_df_cleaned_reformat_without_NaN.assoc

--beta --binary-target T --bp BP --chr CHR --clump-kb 250kb --clump-p 1.000000 --clump-r2 0.100000 --dose-thres 0.000000 --extract @.***/ukbiobank/PRS_SumStats_Pheno/HF_snp_extract.txt

--hard --hard-thres 0.100000 --id-delim "_" --interval 5e-05 --lower 5e-08 --num-auto 22 --out HF_All_GBMI_with_HCM --pheno /home/ukbiobank/PRS_SumStats_Pheno/HF_reordered_pheno.pheno --print-snp --pvalue P --seed 2781250093 --snp SNP --stat BETA --target /home/ukbiobank/Genotypes/ukb21007_c#_b0_v1.recoded,/home/ukbiobank/Genotypes/ukb21007_c1_b0_v1.recoded.sample

--thread 23 --type bgen --ultra --upper 0.5

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYRU62CWZBVDKPZ45CLZLQJSDAVCNFSM6AAAAABKTIPLRWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TQNZSG44TOMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>