choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
180 stars 85 forks source link

Error: ZSTD compression currently unsupported #326

Closed sakuramodokich closed 1 year ago

sakuramodokich commented 1 year ago

Hi Sam,

I was trying to compute PRS from UKBB BGEN v1.2 files but ran into an error using PRSice v 2.3.5. Below are my commands and outputs.

Rscript ./PRSice.R --dir . \
> --prsice ./PRSice_linux --base Roselli_2018_AF_HRC_GWAS_EURv11.txt \
> --snp MarkerName --chr chr --bp pos --A1 Allele1 --A2 Allele2 --beta --stat Effect --pvalue P-value \
> --target-list blist.txt,ukb21008_c1_b0_v1.sample \
> --type bgen \
> --allow-inter \
> --binary-target T \
> --pheno af_df.phe \
> --pheno-col af_cc \
> --extract UKB_imputed.valid \
> --keep af_df_sample_ID.txt \
> --thread 36 \
> --out UKB_imputed \
> --ignore-fid

Output:

PRSice 2.3.5 (2021-09-20) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2023-06-29 16:13:56
./PRSice_linux \
    --a1 Allele1 \
    --a2 Allele2 \
    --allow-inter  \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base Roselli_2018_AF_HRC_GWAS_EURv11.txt \
    --beta  \
    --binary-target T \
    --bp pos \
    --chr chr \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --extract UKB_imputed.valid \
    --ignore-fid  \
    --interval 5e-05 \
    --keep af_df_sample_ID.txt \
    --lower 5e-08 \
    --num-auto 22 \
    --out UKB_imputed \
    --pheno af_df.phe \
    --pheno-col af_cc \
    --pvalue P-value \
    --seed 3925567544 \
    --snp MarkerName \
    --stat Effect \
    --target-list blist.txt,ukb21008_c1_b0_v1.sample \
    --thread 36 \
    --type bgen \
    --upper 0.5

Initializing Genotype info from file: blist.txt (bgen) 
With external fam file: ukb21008_c1_b0_v1.sample 

Start processing Roselli_2018_AF_HRC_GWAS_EURv11 
================================================== 

SNP extraction/exclusion list contains 5 columns, will 
assume first column contains the SNP ID 

Base file: Roselli_2018_AF_HRC_GWAS_EURv11.txt 
Header of file is: 
MarkerName      Allele1 Allele2 chr     pos     Effect  StdErr  P-value 

Reading 100.00%
9362422 variant(s) observed in base file, with: 
9296296 variant(s) excluded based on user input 
66126 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

488315 people (223502 male(s), 264624 female(s)) observed 
337053 founder(s) included 

27408K SNPs processed in ukb21008_c1_b0_v1.bgen   
30007K SNPs processed in ukb21008_c2_b0_v1.bgen   
24862K SNPs processed in ukb21008_c3_b0_v1.bgen   
23956K SNPs processed in ukb21008_c4_b0_v1.bgen   
22364K SNPs processed in ukb21008_c5_b0_v1.bgen   
21077K SNPs processed in ukb21008_c6_b0_v1.bgen   
19958K SNPs processed in ukb21008_c7_b0_v1.bgen   
19289K SNPs processed in ukb21008_c8_b0_v1.bgen   
15110K SNPs processed in ukb21008_c9_b0_v1.bgen   
16751K SNPs processed in ukb21008_c10_b0_v1.bgen   
16905K SNPs processed in ukb21008_c11_b0_v1.bgen   
16373K SNPs processed in ukb21008_c12_b0_v1.bgen   
12061K SNPs processed in ukb21008_c13_b0_v1.bgen   
11034K SNPs processed in ukb21008_c14_b0_v1.bgen   
10071K SNPs processed in ukb21008_c15_b0_v1.bgen   
11382K SNPs processed in ukb21008_c16_b0_v1.bgen   
9951K SNPs processed in ukb21008_c17_b0_v1.bgen   
9464K SNPs processed in ukb21008_c18_b0_v1.bgen   
7620K SNPs processed in ukb21008_c19_b0_v1.bgen   
7899K SNPs processed in ukb21008_c20_b0_v1.bgen   
4376K SNPs processed in ukb21008_c21_b0_v1.bgen   
4645K SNPs processed in ukb21008_c22_b0_v1.bgen   
342507087 variant(s) not found in previous data 
604 variant(s) with mismatch information 
66126 variant(s) included 

Calculate MAF and perform filtering on target SNPs 
================================================== 

Calculating allele frequencies: 0.00%Error: ZSTD compression currently unsupported

Error: 
Execution halted

Do you have any idea why PRSice could not compute using bgen files even with --allow-inter?

Thank you very much for your help!

choishingwan commented 1 year ago

Your bgen format might be newer, which use zstd compression instead of zlib. And for reasons I don't remember, I disabled support to zstd (maybe something related to facebook's license)

So, unfortunately, we didn't support your bgen file

On Fri, Jun 30, 2023, 12:07 AM sakuramodoki @.***> wrote:

Assigned #326 https://github.com/choishingwan/PRSice/issues/326 to @choishingwan https://github.com/choishingwan.

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/326#event-9683712347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYWF6NZ2TWIC2ONKFBDXNZGGPANCNFSM6AAAAAAZZKDM3Q . You are receiving this because you were assigned.Message ID: @.***>

sakuramodokich commented 1 year ago

Thank you for your reply. I believe I can convert zstd to zlib using qctool. Although it may be time-consuming, it seems to be the only solution

choishingwan commented 1 year ago

If you are doing that, better use plunk2 and convert to bed as that will be faster downstream

On Fri, Jun 30, 2023, 12:33 AM sakuramodoki @.***> wrote:

Thank you for your reply. I believe I can convert zstd to zlib using qctool. Although it may be time-consuming, it seems to be the only solution

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/326#issuecomment-1614105297, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYVOFQKUBCN7Q5QTU23XNZJKVANCNFSM6AAAAAAZZKDM3Q . You are receiving this because you were mentioned.Message ID: @.***>

sakuramodokich commented 1 year ago

Thank you for your suggestion. However, I have considered that converting to BED/BIM/FAM may result in data loss. Also, BED files are usually much larger, so I opted for a compression format conversion only.

choishingwan commented 1 year ago

Problem with bgen is that we end up converting to bed bim fam for clumping anyway, so that'll still use a lot of space. As for information lost, our experience is that it's not too significant

On Fri, Jun 30, 2023, 10:44 AM sakuramodoki @.***> wrote:

Thank you for your suggestion. However, I have considered that converting to BED/BIM/FAM may result in data loss. Also, BED files are usually much larger, so I opted for a compression format conversion only.

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/326#issuecomment-1614758183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYQTSO7KDKAM6R2M2GTXN3Q47ANCNFSM6AAAAAAZZKDM3Q . You are receiving this because you were mentioned.Message ID: @.***>

sakuramodokich commented 1 year ago

I understand, thank you!