choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
187 stars 90 forks source link

PRSice does not run if no phenotype is presented- intentional? #289

Closed MattCloward closed 2 years ago

MattCloward commented 2 years ago

I am getting an error and wanted to check with you if what I am observing is an error or intentional design. When all the phenotypes for the samples in my fam file are NA, I get "No phenotype presented". If they are all -9, for the --stat OR flag I get No phenotype presented and for the --stat BETA flag I get

Only one phenotype value detected and they are all -9. 
Not enough valid phenotype

I saw that a similar bug was patched in version v2.3.0.a, but I'm not sure if my issue is related. My assumption is that I can run PRSice2 without presenting the program with phenotype. This is necessary, especially for cases in which I am analyzing young individuals for a late onset disease such as Alziehmer's disease. Is it possible to run PRSice2 without phenotype?

I am running PRSice 2.3.5 on a Red Hat Linux Server. My project has the following structure:

/ (project root)
  /PRSice_linux
    PRSice.R
    PRSice_linux
  /test-inputs
    test-OR.assoc
    test-beta.assoc
    test.bed
    test.bim
    test.fam

I ran the following two commands, one for OR and one for BETA (notice I don't specify a .pheno file):

Rscript "./PRSice_linux/PRSice.R" --dir . --prsice ./PRSice_linux/PRSice_linux --base test-inputs/test-OR.assoc --target test-inputs/test --thread 1 --stat OR --binary-target T
Rscript "./PRSice_linux/PRSice.R" --dir . --prsice ./PRSice_linux/PRSice_linux --base test-inputs/test-beta.assoc --target test-inputs/test --thread 1 --stat BETA --beta --binary-target F

Here are the contents of the relevant files:

test-OR.assoc

SNP CHR BP A1 A2 P OR
rs10757274 9 22096055 A G 2e-33 1.24

test-beta.assoc

SNP CHR BP A1 A2 P BETA
rs10786714 10 104598606 C G 2e-07 0.110168
rs114811870 6 31324938 T C 5e-06 0.383654

test.bim

6   rs114811870 0   31324938    C   T
9   rs10757274  0   22096055    G   A
10  rs10786714  0   104598606   G   C

test.fam

0   SAMP001 0   0   0   NA
0   SAMP002 0   0   0   NA
0   SAMP003 0   0   0   NA

OR log (phenos all NA):

PRSice 2.3.5 (2021-09-20) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2022-02-25 12:18:10
./PRSice_linux/PRSice_linux \
    --a1 A1 \
    --a2 A2 \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base test-inputs/test-OR.assoc \
    --binary-target T \
    --bp BP \
    --chr CHR \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --interval 5e-05 \
    --lower 5e-08 \
    --num-auto 22 \
    --or  \
    --out PRSice \
    --pvalue P \
    --seed 2440930204 \
    --snp SNP \
    --stat OR \
    --target test-inputs/test \
    --thread 1 \
    --upper 0.5

Initializing Genotype file: test-inputs/test (bed) 

Start processing test-OR 
================================================== 

Base file: test-inputs/test-OR.assoc 
Header of file is: 
SNP CHR BP A1 A2 P OR 

Reading 100.00%
1 variant(s) observed in base file, with: 
1 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

3 people (0 male(s), 0 female(s)) observed 
3 founder(s) included 

9564 variant(s) not found in previous data 
1 variant(s) included 

There are a total of 1 phenotype to process 

Start performing clumping 

Clumping Progress: 100.00%
Number of variant(s) after clumping : 1 

Processing the 1 th phenotype 

No phenotype presented 

Error: 
Execution halted

BETA log (phenos all NA):

PRSice 2.3.5 (2021-09-20) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2022-02-25 12:21:42
./PRSice_linux/PRSice_linux \
    --a1 A1 \
    --a2 A2 \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base test-inputs/test-beta.assoc \
    --beta  \
    --binary-target F \
    --bp BP \
    --chr CHR \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --interval 5e-05 \
    --lower 5e-08 \
    --num-auto 22 \
    --out PRSice \
    --pvalue P \
    --seed 208934505 \
    --snp SNP \
    --stat BETA \
    --target test-inputs/test \
    --thread 1 \
    --upper 0.5

Initializing Genotype file: test-inputs/test (bed) 

Start processing test-beta 
================================================== 

Base file: test-inputs/test-beta.assoc 
Header of file is: 
SNP CHR BP A1 A2 P BETA 

Reading 100.00%
2 variant(s) observed in base file, with: 
1 ambiguous variant(s) excluded 
1 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

3 people (0 male(s), 0 female(s)) observed 
3 founder(s) included 

9564 variant(s) not found in previous data 
1 variant(s) included 

There are a total of 1 phenotype to process 

Start performing clumping 

Clumping Progress: 100.00%
Number of variant(s) after clumping : 1 

Processing the 1 th phenotype 

No phenotype presented 

Error: 
Execution halted

To summarize, is it possible to run PRSice2 without knowing or imputing phenotype? If we want to use polygenic risk scores to predict phenotype for late onset diseases such as Alzheimer's disease, we can't give PRSice2 a phenotype.

choishingwan commented 2 years ago

Use --no-regress

That will generate the PRS without needing the phenotypes. Problem is that you won't know what is the best threshold for prediction. You'll need a phenotype to optimize that. (or you can use established threshold based on publications)

On Fri, Feb 25, 2022 at 2:33 PM Matthew Cloward @.***> wrote:

Assigned #289 https://github.com/choishingwan/PRSice/issues/289 to @choishingwan https://github.com/choishingwan.

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/289#event-6141238495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYXN2FI4EYUD5NAIYJTU47KQFANCNFSM5PLDEDHQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were assigned.Message ID: @.***>

MattCloward commented 2 years ago

That did it! Thank you very much!

MattCloward commented 2 years ago

One more question on this topic: is it possible to run bgen files without a phenotype file? Even with the "--no-regress" flag, I get the following error: Error: You must provide a phenotype file for bgen format!

The log:

PRSice 2.3.5 (2021-09-20) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2022-03-21 15:30:17
./PRSice_linux/PRSice_linux \
    --a1 A1 \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base ./assocs/assoc.assoc \
    --binary-target T \
    --bp BP \
    --chr CHR \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --interval 5e-05 \
    --lower 5e-08 \
    --no-regress  \
    --num-auto 22 \
    --or  \
    --out ./prsice2-out/out_bgen_assoc \
    --pvalue P \
    --seed 529495180 \
    --snp SNP \
    --stat OR \
    --target data/out_bgen \
    --thread 1 \
    --type bgen \
    --upper 0.5

Error: You must provide a phenotype file for bgen format! 

Error: 
Execution halted
choishingwan commented 2 years ago

Nope, not at the moment. You can submit a sample file as an external fam though. E.g. --target XXX,yyy.sample --no-regress

Sam

On Mon, Mar 21, 2022, 5:34 PM Matthew Cloward @.***> wrote:

One more question on this topic: is it possible to run bgen files without a phenotype file? Even with the "--no-regress" flag, I get the following error: Error: You must provide a phenotype file for bgen format!

The log:

PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2022-03-21 15:30:17 ./PRSice_linux/PRSice_linux \ --a1 A1 \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base ./assocs/assoc.assoc \ --binary-target T \ --bp BP \ --chr CHR \ --clump-kb 250kb \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --interval 5e-05 \ --lower 5e-08 \ --no-regress \ --num-auto 22 \ --or \ --out ./prsice2-out/out_bgen_assoc \ --pvalue P \ --seed 529495180 \ --snp SNP \ --stat OR \ --target data/out_bgen \ --thread 1 \ --type bgen \ --upper 0.5

Error: You must provide a phenotype file for bgen format!

Error: Execution halted

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/289#issuecomment-1074443566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYQCBFEOWSS3RAT3HH3VBDTPLANCNFSM5PLDEDHQ . You are receiving this because you were mentioned.Message ID: @.***>