choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
182 stars 86 forks source link

Analysis of the extreme PRS (top 1%, low 1%) #195

Closed giuliapontali closed 4 years ago

giuliapontali commented 4 years ago

I have a question regarding results on PRS using PRSice. I have a binary trait (9730 controls and 156 cases). For the analysis, I took the top 1% of the samples, who have a high risk to develop the disease. It is interesting to see that in the top 1% I have only one participant is affected by the disease but if I analyze the low 1% I see that I have 4 participants that have the disease. Dividing PRS into quantile the majority of affected samples are in the first quantile but I expected that those will be in the tenth quantile. Why does it happen?

Thanks for your time

choishingwan commented 4 years ago

There simply isn't enough information. What is the R2 of the PRS? Is it significant? How does the quantile plot look like? Did you use any covariate? And how do you define the quantiles?

It is definitely possible for most cases to cluster in the lowest quantile, esp when you have a negative correlation between the PRS and your trait of interest.

choishingwan commented 4 years ago

Also, this isn't a bug of PRSice, so please do try to use the appropriate label.

giuliapontali commented 4 years ago

The R2 of PRS is 0.0178212, very low. Using sex and age covariates the R2 is 0.0208883 and the FULL.R2 is 0.18824. To define quantiles I used in R the following command:

prs$decile <- decile(prs$PRS)

choishingwan commented 4 years ago

Do you get similar output if you use

Rscript PRSice.R --quantile 10 XXXX where XXX is your other parameters? (can add --plot to avoid recalculating the PRS)

giuliapontali commented 4 years ago

I used your suggestion but I obtained the following error:

Plotting the quantile plot
WARNING: There are only 0 unique PRS but asked for 10 quantiles
Will not generate the quantile plot
choishingwan commented 4 years ago

There you go. Check the best score output. Those are likely to be nan or your p threshold might be too small

On Thu, 14 May 2020 at 5:06 PM, jPontix notifications@github.com wrote:

I used your suggestion but I obtained the following error:

Plotting the quantile plot WARNING: There are only 0 unique PRS but asked for 10 quantiles Will not generate the quantile plot

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-628501886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYRQ4QHKKFXV2HULDRDRROX77ANCNFSM4NAOBPFA .

-- Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

giuliapontali commented 4 years ago

Yes, the p-value threshold is 1.8x10-8. So this happens because I don't have enough information regarding the trait that I want to analyze. Do you think that the results that I obtained can be useful or not?

choishingwan commented 4 years ago

The error message tells you that you don’t have any unique PRS, which is weird. Did you use something like no regress? What’s the full log from PRSice?

If all your PRS are identical, then it is not useful at all as that’d just be a constant

Sam

On Thu, 14 May 2020 at 10:09 PM, jPontix notifications@github.com wrote:

Yes, the p-value threshold is 1.8x10-8. So this happens because I don't have enough information regarding the trait that I want to analyze. Do you think that the results that I obtained can be useful or not?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-628661722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYUDBSZ6R7B5ZGUHZTDRRP3R7ANCNFSM4NAOBPFA .

-- Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

giuliapontali commented 4 years ago

Here my log file:

PRSice 2.2.13 (2020-03-10) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2020-05-13 19:02:49
/usr/local/bin/PRSice \
    --A1 A1 \
    --A2 A2 \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base /home/atrial_fibrillation/af.qc.txt \
    --binary-target T \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov /home/atrial_fibrillation/Cov_9919orderFam.af.txt \
    --ignore-fid  \
    --interval 5e-05 \
    --lower 5e-08 \
    --num-auto 22 \
    --or  \
    --out af.age.sex \
    --pheno /home/atrial_fibrillation/EUR_9919orderFam.af.txt \
    --pvalue Pvalue \
    --seed 2609875795 \
    --snp MarkerName \
    --stat OR \
    --target /home/10k_HRC_imputed/filtered \
    --thread 1 \
    --upper 0.5

Initializing Genotype file: 
/home/10k_HRC_imputed/filtered (bed) 

Start processing af.qc 
================================================== 

Base file: /home/atrial_fibrillation/af.qc.txt 
21993666 variant(s) observed in base file, with: 
15504 NA stat/p-value observed 
4852 ambiguous variant(s) excluded 
21973310 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

9919 people (0 male(s), 0 female(s)) observed 
9919 founder(s) included 

4500111 variant(s) not found in previous data 
982676 variant(s) included 

Check Phenotype file: 
/home/atrial_fibrillation/EUR_9919orderFam.af.txt 
Column Name of Sample ID: IID 
Note: If the phenotype file does not contain a header, the 
column name will be displayed as the Sample ID which is ok. 
Phenotype Name: AF 
There are a total of 1 phenotype to process 

Start performing clumping 

Number of variant(s) after clumping : 137039 

AF is a binary phenotype 
33 sample(s) without phenotype 
9730 control(s) 
156 case(s) 

Processing the covariate file: 
/home/atrial_fibrillation/Cov_9919orderFam.af.txt 
============================== 

Include Covariates: 
Name    Missing Number of levels 
AGE 0   - 
SEX 0   - 

After reading the covariate file, 9886 sample(s) included 
in the analysis 

There are 1 region(s) with p-value less than 1e-5. Please 
note that these results are inflated due to the overfitting 
inherent in finding the best-fit PRS (but it's still best 
to find the best-fit PRS!). 
You can use the --perm option (see manual) to calculate an 
empirical P-value. 

Analyszing the .best file I don't see identical PRS... Sorry for all these questions

choishingwan commented 4 years ago

No worries. If the best file does not contain any identical PRS, and you are using --quantile 10, then there might be some problem with the Rscript or that some characteristic of your data are causing the problem. Unfortunately, I won't be able to determine which is the cause unless I can get hold of your .best, phenotype and covariate file.

Can you check in the .best file, for samples with yes under the In_Regression column, do they have unique PRS? If you merge the best, phenotype and covariate file together in R, and filter out samples with yes under the In_Regression column, how many unique PRS do you observe?

Sam

On Thu, May 14, 2020 at 11:01 PM jPontix notifications@github.com wrote:

Here my log file:

PRSice 2.2.13 (2020-03-10) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2020-05-13 19:02:49 /usr/local/bin/PRSice \ --A1 A1 \ --A2 A2 \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base /home/atrial_fibrillation/af.qc.txt \ --binary-target T \ --clump-kb 250kb \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --cov /home/atrial_fibrillation/Cov_9919orderFam.af.txt \ --ignore-fid \ --interval 5e-05 \ --lower 5e-08 \ --num-auto 22 \ --or \ --out af.age.sex \ --pheno /home/atrial_fibrillation/EUR_9919orderFam.af.txt \ --pvalue Pvalue \ --seed 2609875795 \ --snp MarkerName \ --stat OR \ --target /home/10k_HRC_imputed/filtered \ --thread 1 \ --upper 0.5

Initializing Genotype file: /home/10k_HRC_imputed/filtered (bed)

Start processing af.qc

Base file: /home/atrial_fibrillation/af.qc.txt 21993666 variant(s) observed in base file, with: 15504 NA stat/p-value observed 4852 ambiguous variant(s) excluded 21973310 total variant(s) included from base file

Loading Genotype info from target

9919 people (0 male(s), 0 female(s)) observed 9919 founder(s) included

4500111 variant(s) not found in previous data 982676 variant(s) included

Check Phenotype file: /home/atrial_fibrillation/EUR_9919orderFam.af.txt Column Name of Sample ID: IID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is ok. Phenotype Name: AF There are a total of 1 phenotype to process

Start performing clumping

Number of variant(s) after clumping : 137039

AF is a binary phenotype 33 sample(s) without phenotype 9730 control(s) 156 case(s)

Processing the covariate file: /home/atrial_fibrillation/Cov_9919orderFam.af.txt

Include Covariates: Name Missing Number of levels AGE 0 - SEX 0 -

After reading the covariate file, 9886 sample(s) included in the analysis

There are 1 region(s) with p-value less than 1e-5. Please note that these results are inflated due to the overfitting inherent in finding the best-fit PRS (but it's still best to find the best-fit PRS!). You can use the --perm option (see manual) to calculate an empirical P-value.

Analyszing the .best file I don't see identical PRS... Sorry for all these questions

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-628688921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYVZN3TPMGCGQCAQF43RRQA4TANCNFSM4NAOBPFA .

giuliapontali commented 4 years ago

Samples with yes under the In_Regression column are 9875 out 9919 have unique PRS.

If I merge the .best, phenotype and covariate file together in R and I filter out samples with yes I remained with 33 samples with unique PRS.

choishingwan commented 4 years ago

Something is wrong then. There should be 33 samples with No in the In_Regression column (as indicated by the log). I will have to look more into it. It will help a lot if I can have a mock data that generate similar output.

giuliapontali commented 4 years ago

Yes, there are 33 samples with No in the In_Regression column (sorry, I explained myself wrong)

choishingwan commented 4 years ago

Then I wonder why PRSice said there are 0 unique PRS, it is as if it is reading something wrong.

Mind sending me the top 10 line of best, pheno and cov file, with the sample ID replace by 1-10? Otherwise, I wouldn't be able to see what's the problem. (Ideally, those 10 lines are samples with Yes under the In_Regress column)

Sam

On Fri, May 15, 2020 at 4:05 PM jPontix notifications@github.com wrote:

Yes, there are 33 samples with No in the In_Regression column (sorry, I explained myself wrong)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-629094509, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYUL4PZJAGI46BUZCX3RRTZV3ANCNFSM4NAOBPFA .

giuliapontali commented 4 years ago

Here the .best file:

FID IID In_Regression PRS 
1   1   Yes 0.01736641  
2   2   Yes 0.01731742  
3   3   Yes 0.01729993  
4   4   Yes 0.01728049  
5   5   Yes 0.01726938  
6   6   Yes 0.01725730  
7   7   Yes 0.01725536  
8   8   Yes 0.01725038  
9   9   Yes 0.01724758  
10  10  Yes 0.01723659  

the Cov file:

IID AGE SEX
1    33     1
2    65     1
3   22  1
4   26  1
5   42  2
6   64  2
7   59  2
8   19  1
9   28  2
10  61  2

the pheno file (I have binary trait):

IID AF
1   0  
2   0 
3   0 
4   0 
5   0 
6   0 
7   0 
8   0 
9   0 
10  0 
choishingwan commented 4 years ago

If you use this Rscript, what are the on screen outputs?

Sam

giuliapontali commented 4 years ago

Do you mean run PRSice with these 10 samples?

choishingwan commented 4 years ago

No, run PRSice Rscript with the quantile option just like before, I wrote some print statement to help figuring out what might be the problem

On Sat, 16 May 2020 at 1:28 AM, jPontix notifications@github.com wrote:

Do you mean run PRSice with these 10 samples?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-629384880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYUW3TYZAQN7RUVWMN3RRV3T7ANCNFSM4NAOBPFA .

-- Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

giuliapontali commented 4 years ago

If I put --quantile option I obtained the following output:

PRSice 2.2.13 (2020-03-10) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2020-05-14 19:40:35
/usr/local/bin/PRSice \
    --A1 A1 \
    --A2 A2 \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base /home/atrial_fibrillation/af.qc.txt \
    --binary-target T \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov /home/atrial_fibrillation/Cov_9919orderFam.af.txt \
    --ignore-fid  \
    --interval 5e-05 \
    --lower 5e-08 \
    --num-auto 22 \
    --or  \
    --out af.af.quantile \
    --pheno /home/atrial_fibrillation/EUR_9919orderFam.af.txt \
    --pvalue Pvalue \
    --seed 1004579927 \
    --snp MarkerName \
    --stat OR \
    --target /home/10k_HRC_imputed/filtered \
    --thread 1 \
    --upper 0.5

Initializing Genotype file: 
/home/10k_HRC_imputed/filtered (bed) 

Start processing af.qc 
================================================== 

Reading 100.00%
Base file: /home/atrial_fibrillation/af.qc.txt 
21993666 variant(s) observed in base file, with: 
15504 NA stat/p-value observed 
4852 ambiguous variant(s) excluded 
21973310 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

9919 people (0 male(s), 0 female(s)) observed 
9919 founder(s) included 

4500111 variant(s) not found in previous data 
982676 variant(s) included 

Check Phenotype file: 
/home/atrial_fibrillation/EUR_9919orderFam.af.txt 
Column Name of Sample ID: IID 
Note: If the phenotype file does not contain a header, the 
column name will be displayed as the Sample ID which is ok. 
Phenotype Name: AF 
There are a total of 1 phenotype to process 

Start performing clumping 

Clumping Progress: 100.00%

Number of variant(s) after clumping : 137039 

Processing the 1 th phenotype
AF is a binary phenotype 
33 sample(s) without phenotype 
9730 control(s) 
156 case(s) 

Processing the covariate file: 
/home/atrial_fibrillation/Cov_9919orderFam.af.txt 
============================== 

Include Covariates: 
Name    Missing Number of levels 
AGE 0   - 
SEX 0   - 

After reading the covariate file, 9886 sample(s) included 
in the analysis 

Preparing Output Files

Start Processing
Processing 100.00%
There are 1 region(s) with p-value less than 1e-5. Please 
note that these results are inflated due to the overfitting 
inherent in finding the best-fit PRS (but it's still best 
to find the best-fit PRS!). 
You can use the --perm option (see manual) to calculate an 
empirical P-value. 

Begin plotting
Current Rscript version = 2.2.12
Plotting the quantile plot
WARNING: There are only 0 unique PRS but asked for 10 quantiles
Will not generate the quantile plot for  af.af.quantile
Plotting Bar Plot
Plotting the high resolution plot
Warning message:
In max(pheno$Pheno) : no non-missing arguments to max; returning -Inf
choishingwan commented 4 years ago

Did you use the Rscript I sent you? As I don't see the expected output

(You can also add in the --plot parameter to the Rscript to avoid recalculating the PRS from scratch)

On Sat, May 16, 2020 at 11:22 PM jPontix notifications@github.com wrote:

If I put --quantile option I obtained the following output:

PRSice 2.2.13 (2020-03-10) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2020-05-14 19:40:35 /usr/local/bin/PRSice \ --A1 A1 \ --A2 A2 \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base /home/atrial_fibrillation/af.qc.txt \ --binary-target T \ --clump-kb 250kb \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --cov /home/atrial_fibrillation/Cov_9919orderFam.af.txt \ --ignore-fid \ --interval 5e-05 \ --lower 5e-08 \ --num-auto 22 \ --or \ --out af.af.quantile \ --pheno /home/atrial_fibrillation/EUR_9919orderFam.af.txt \ --pvalue Pvalue \ --seed 1004579927 \ --snp MarkerName \ --stat OR \ --target /home/10k_HRC_imputed/filtered \ --thread 1 \ --upper 0.5

Initializing Genotype file: /home/10k_HRC_imputed/filtered (bed)

Start processing af.qc

Reading 100.00% Base file: /home/atrial_fibrillation/af.qc.txt 21993666 variant(s) observed in base file, with: 15504 NA stat/p-value observed 4852 ambiguous variant(s) excluded 21973310 total variant(s) included from base file

Loading Genotype info from target

9919 people (0 male(s), 0 female(s)) observed 9919 founder(s) included

4500111 variant(s) not found in previous data 982676 variant(s) included

Check Phenotype file: /home/atrial_fibrillation/EUR_9919orderFam.af.txt Column Name of Sample ID: IID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is ok. Phenotype Name: AF There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00%

Number of variant(s) after clumping : 137039

Processing the 1 th phenotype AF is a binary phenotype 33 sample(s) without phenotype 9730 control(s) 156 case(s)

Processing the covariate file: /home/atrial_fibrillation/Cov_9919orderFam.af.txt

Include Covariates: Name Missing Number of levels AGE 0 - SEX 0 -

After reading the covariate file, 9886 sample(s) included in the analysis

Preparing Output Files

Start Processing Processing 100.00% There are 1 region(s) with p-value less than 1e-5. Please note that these results are inflated due to the overfitting inherent in finding the best-fit PRS (but it's still best to find the best-fit PRS!). You can use the --perm option (see manual) to calculate an empirical P-value.

Begin plotting Current Rscript version = 2.2.12 Plotting the quantile plot WARNING: There are only 0 unique PRS but asked for 10 quantiles Will not generate the quantile plot for af.af.quantile Plotting Bar Plot Plotting the high resolution plot Warning message: In max(pheno$Pheno) : no non-missing arguments to max; returning -Inf

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-629662713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYWKERPVUUJJVS7OHPTRR2VT7ANCNFSM4NAOBPFA .

giuliapontali commented 4 years ago

Sorry, but I didn't use your script because I can not see any script

choishingwan commented 4 years ago

Oh, github removed the attachment if I reply by email. Here you go https://www.dropbox.com/s/t8ntgmyayywnqav/PRSice.R?dl=0

giuliapontali commented 4 years ago

Perfect, thanks! Here the results using the script provided by you:

Initializing Genotype file: 
/home/HRC_imputed/filtered (bed) 

Start processing af.qc 
================================================== 

Reading 100.00%
Base file: /home/atrial_fibrillation/af.qc.txt 
21993666 variant(s) observed in base file, with: 
15504 NA stat/p-value observed 
4852 ambiguous variant(s) excluded 
21973310 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

9919 people (0 male(s), 0 female(s)) observed 
9919 founder(s) included 

4500111 variant(s) not found in previous data 
982676 variant(s) included 

Check Phenotype file: 
/home/atrial_fibrillation/EUR_9919orderFam.af.txt 
Column Name of Sample ID: IID 
Note: If the phenotype file does not contain a header, the 
column name will be displayed as the Sample ID which is ok. 
Phenotype Name: AF 
There are a total of 1 phenotype to process 

Start performing clumping 

Clumping Progress: 100.00%

Number of variant(s) after clumping : 137039 

Processing the 1 th phenotype
AF is a binary phenotype 
33 sample(s) without phenotype 
9730 control(s) 
156 case(s) 

Processing the covariate file: 
/home/atrial_fibrillation/Cov_9919orderFam.af.txt 
============================== 

Include Covariates: 
Name    Missing Number of levels 
AGE 0   - 
SEX 0   - 

After reading the covariate file, 9886 sample(s) included 
in the analysis 

Preparing Output Files

Start Processing
Processing 100.00%
There are 1 region(s) with p-value less than 1e-5. Please 
note that these results are inflated due to the overfitting 
inherent in finding the best-fit PRS (but it's still best 
to find the best-fit PRS!). 
You can use the --perm option (see manual) to calculate an 
empirical P-value. 

Begin plotting
Current Rscript version = 2.2.14
Plotting the quantile plot
[1] "PRS"
       FID      IID        PRS
1 321 321 0.01655219
2 296 296 0.01653804
3 771 771 0.01660727
4 134 134 0.01694247
5 950 950 0.01654084
6 006 006 0.01651537
[1] "Pheno"
[1] IID   Pheno
<0 rows> (or 0-length row.names)
[1] "Merge"
[1] IID   FID   PRS   Pheno
<0 rows> (or 0-length row.names)
WARNING: There are only 0 unique PRS but asked for 10 quantiles
Will not generate the quantile plot for  af.help
Plotting Bar Plot
Plotting the high resolution plot
Warning message:
In max(pheno$Pheno) : no non-missing arguments to max; returning -Inf
choishingwan commented 4 years ago

Ok, something is wrong. The Rscript seems to read in the PRS data correctly and filtered out the in_regress line, but then for some reason changed the structure of the data table. Give me sometime, I will try and finish what I have on my hand and then come back to you. Will definitely need the data to check (or else I will have to send you another copy of the Rscript to keep doing this print out debugging)

On Mon, May 18, 2020 at 1:27 AM jPontix notifications@github.com wrote:

Perfect, thanks! Here the results using the script provided by you:

Initializing Genotype file: /home/HRC_imputed/filtered (bed)

Start processing af.qc

Reading 100.00% Base file: /home/atrial_fibrillation/af.qc.txt 21993666 variant(s) observed in base file, with: 15504 NA stat/p-value observed 4852 ambiguous variant(s) excluded 21973310 total variant(s) included from base file

Loading Genotype info from target

9919 people (0 male(s), 0 female(s)) observed 9919 founder(s) included

4500111 variant(s) not found in previous data 982676 variant(s) included

Check Phenotype file: /home/atrial_fibrillation/EUR_9919orderFam.af.txt Column Name of Sample ID: IID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is ok. Phenotype Name: AF There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00%

Number of variant(s) after clumping : 137039

Processing the 1 th phenotype AF is a binary phenotype 33 sample(s) without phenotype 9730 control(s) 156 case(s)

Processing the covariate file: /home/atrial_fibrillation/Cov_9919orderFam.af.txt

Include Covariates: Name Missing Number of levels AGE 0 - SEX 0 -

After reading the covariate file, 9886 sample(s) included in the analysis

Preparing Output Files

Start Processing Processing 100.00% There are 1 region(s) with p-value less than 1e-5. Please note that these results are inflated due to the overfitting inherent in finding the best-fit PRS (but it's still best to find the best-fit PRS!). You can use the --perm option (see manual) to calculate an empirical P-value.

Begin plotting Current Rscript version = 2.2.14 Plotting the quantile plot [1] "PRS" FID IID PRS 1 321 321 0.01655219 2 296 296 0.01653804 3 771 771 0.01660727 4 134 134 0.01694247 5 950 950 0.01654084 6 006 006 0.01651537 [1] "Pheno" [1] IID Pheno

<0 rows> (or 0-length row.names) [1] "Merge" [1] IID FID PRS Pheno <0 rows> (or 0-length row.names) WARNING: There are only 0 unique PRS but asked for 10 quantiles Will not generate the quantile plot for af.help Plotting Bar Plot Plotting the high resolution plot Warning message: In max(pheno$Pheno) : no non-missing arguments to max; returning -Inf — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub , or unsubscribe .
giuliapontali commented 4 years ago

Ok, many thanks Sam. Unfortunately, I cannot share the data with you, my institute doesn't allow me

choishingwan commented 4 years ago

Alright, we will do it step by step then

Can you try this? https://www.dropbox.com/s/nlmobx20upugspj/PRSice.R?dl=0

And show me the output? I suspect there might be some problem when we read in the best file

On Mon, May 18, 2020 at 3:30 PM jPontix notifications@github.com wrote:

Ok, many thanks Sam. Unfortunately, I cannot share the data with you, my institute doesn't allow me

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-629998626, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYUOBVS2PMYBTPR3TVTRSDPZTANCNFSM4NAOBPFA .

giuliapontali commented 4 years ago

Here the output:

Start Processing
Processing 100.00%
There are 1 region(s) with p-value less than 1e-5. Please 
note that these results are inflated due to the overfitting 
inherent in finding the best-fit PRS (but it's still best 
to find the best-fit PRS!). 
You can use the --perm option (see manual) to calculate an 
empirical P-value. 

Begin plotting
Current Rscript version = 2.2.14
[1] "Read best"
       FID      IID In_Regression        PRS
1 321 321           Yes 0.01655219
2 296 296         Yes 0.01653804
3 771 771           Yes 0.01660727
4 134 134           Yes 0.01694247
5 950 950           Yes 0.01654084
6 006 006           Yes 0.01651537
[1] "Post subset"
       FID      IID In_Regression        PRS
1 321 321           Yes 0.01655219
2 296 296           Yes 0.01653804
3 771 771           Yes 0.01660727
4 134 134           Yes 0.01694247
5 950 950           Yes 0.01654084
6 006 006           Yes 0.01651537
[1] "base prs is now"
       FID      IID        PRS
1 321 321 0.01655219
2 296 296 0.01653804
3 771 771 0.01660727
4 134 134 0.01694247
5 950 950 0.01654084
6 006 006 0.01651537
Plotting the quantile plot
WARNING: There are only 0 unique PRS but asked for 10 quantiles
Will not generate the quantile plot for  af.help
Plotting Bar Plot
Plotting the high resolution plot
Warning message:
In max(pheno$Pheno) : no non-missing arguments to max; returning -Inf
choishingwan commented 4 years ago

If you install the data.table package, will the same problem happen? It seems like R was unable to properly read in the best file (maybe the mixing \t and " " is the problem)

choishingwan commented 4 years ago

Another possible problem might be that your FID and IID have some special characters behind it, causing some problems during the output.

giuliapontali commented 4 years ago

Regarding the IDs something happens because for examples my samples have the following ID: example 0010197321 but the same sample in the screen output is 10197321. I lose the 00 prefixes.

(I am using R version 3.6.3 with data.table package)

giuliapontali commented 4 years ago

Do you suggest to replace " " with "\t" in phenotype and cov file?

choishingwan commented 4 years ago

That's kinda interesting. Basically R considered your FID and IID as numbers. However, from the log file, it doesn't seems to be a problem (note there's a 006 sample). From what it looks like, it seems like R has treated the best file to have four column: FID, IID, In_Regression and PRS correct, but then when it start reading the file, it treated the FID and IID column as the same, which has caused the bug that we observed here. It'd be strange if you've been using PRSice directly and didn't touch the best file as the file should be space separated and that shouldn't really caused the issue. If you use this build, will the problem be solved?

PRSice_linux Open link on Dropbox https://www.dropbox.com/s/s8ycohvlqcrkj6s/PRSice_linux?dl=0&oref=gp

giuliapontali commented 4 years ago

I cannot open on Dropbox. Said that the 404 file is not here

choishingwan commented 4 years ago

oh... was compiling it for the new release

PRSice_linux Open link on Dropbox https://www.dropbox.com/s/0kbsw7mr8ahzso4/PRSice_linux?dl=0&oref=gp

choishingwan commented 4 years ago

Just did that again. If you haven't downloaded it, here is the new link PRSice_linux Open link on Dropbox https://www.dropbox.com/s/ox0kwhvsxv0jxl3/PRSice_linux?dl=0&oref=gp

giuliapontali commented 4 years ago

Thanks, I am going to try this version

Il giorno lun 18 mag 2020 alle 16:52 Shing Wan Choi < notifications@github.com> ha scritto:

Just did that again. If you haven't downloaded it, here is the new link PRSice_linux Open link on Dropbox https://www.dropbox.com/s/ox0kwhvsxv0jxl3/PRSice_linux?dl=0&oref=gp

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-630235063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKX277TMJWW427ZVHDZ2GE3RSFDRFANCNFSM4NAOBPFA .

giuliapontali commented 4 years ago

Here the output with the last version of PRSice_linux. I don't understand why it doesn't read the phenotype present in the pheno file.

Initializing Genotype file: 
/home/HRC_imputed/filtered (bed) 

Start processing af.cleaning 
================================================== 

Base file: 
/home/atrial_fibrillation/af.cleaning.txt 
Header of file is: 

MarkerName  rs_dbSNP147 CHR POS_GRCh37  A1  A2  Freq_A2 Effect_A2   StdErr  Pvalue  OR 

Reading 100.00%
29475783 variant(s) observed in base file, with: 
713528 variant(s) located on haploid chromosome 
14136 NA stat/p-value observed 
5354 ambiguous variant(s) excluded 
28742765 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

9919 people (0 male(s), 0 female(s)) observed 
9919 founder(s) included 

3619959 variant(s) not found in previous data 
1862828 variant(s) included 

Phenotype file: /home/EUR_9919orderFam.af.txt 
Column Name of Sample ID: IID 
Note: If the phenotype file does not contain a header, the 
column name will be displayed as the Sample ID which is 
expected. 

There are a total of 1 phenotype to process 

Start performing clumping 

Clumping Progress: 100.00%
Number of variant(s) after clumping : 527406 

Processing the 1 th phenotype 

No phenotype presented 

Error: 
Execution halted
choishingwan commented 4 years ago

Mind sending me the full log? E.g. don't cut out the command line section as that helps me to see if the command used is correct and it also give me information of the version of PRSice you are using. I don't expect any of the PRSice version to have this output as we always first report whether the phenotype is continuous or binary before we will go into the error of no phenotype presented.

On Tue, May 19, 2020 at 2:38 AM jPontix notifications@github.com wrote:

Here the output with the last version of PRSice_linux. I don't understand why it doesn't read the phenotype present in the pheno file.

Initializing Genotype file: /home/HRC_imputed/filtered (bed)

Start processing af.cleaning

Base file: /home/atrial_fibrillation/af.cleaning.txt Header of file is:

MarkerName rs_dbSNP147 CHR POS_GRCh37 A1 A2 Freq_A2 Effect_A2 StdErr Pvalue OR

Reading 100.00% 29475783 variant(s) observed in base file, with: 713528 variant(s) located on haploid chromosome 14136 NA stat/p-value observed 5354 ambiguous variant(s) excluded 28742765 total variant(s) included from base file

Loading Genotype info from target

9919 people (0 male(s), 0 female(s)) observed 9919 founder(s) included

3619959 variant(s) not found in previous data 1862828 variant(s) included

Phenotype file: /home/EUR_9919orderFam.af.txt Column Name of Sample ID: IID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00% Number of variant(s) after clumping : 527406

Processing the 1 th phenotype

No phenotype presented

Error: Execution halted

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-630365208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYRZXE6LI5VLT5QRPO3RSF6CZANCNFSM4NAOBPFA .

giuliapontali commented 4 years ago

The phenotype is binary, here the full log file:

PRSice 2.3.0 (2020-05-18) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2020-05-18 19:58:41
./PRSice_linux \
    --A1 A1 \
    --A2 A2 \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base /home/atrial_fibrillation/af.cleaning.txt \
    --binary-target T \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov /home/atrial_fibrillation/Cov_9919orderFam.af.txt \
    --ignore-fid  \
    --interval 5e-05 \
    --lower 5e-08 \
    --num-auto 22 \
    --or  \
    --out af.help \
    --pheno /home/EUR_9919orderFam.af.txt \
    --pvalue Pvalue \
    --seed 20399376 \
    --snp MarkerName \
    --stat OR \
    --target /home/HRC_imputed/filtered \
    --thread 1 \
    --upper 0.5

Initializing Genotype file: 
/home/HRC_imputed/filtered (bed) 

Start processing af.cleaning 
================================================== 

Base file: 
/home/atrial_fibrillation/af.cleaning.txt 
Header of file is: 

MarkerName  rs_dbSNP147 CHR POS_GRCh37  A1  A2  Freq_A2 Effect_A2   StdErr  Pvalue  OR 

29475783 variant(s) observed in base file, with: 
713528 variant(s) located on haploid chromosome 
14136 NA stat/p-value observed 
5354 ambiguous variant(s) excluded 
28742765 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

9919 people (0 male(s), 0 female(s)) observed 
9919 founder(s) included 

3619959 variant(s) not found in previous data 
1862828 variant(s) included 

Phenotype file: /home/EUR_9919orderFam.af.txt 
Column Name of Sample ID: IID 
Note: If the phenotype file does not contain a header, the 
column name will be displayed as the Sample ID which is 
expected. 

There are a total of 1 phenotype to process 

Start performing clumping 

Number of variant(s) after clumping : 527406 

Processing the 1 th phenotype 

No phenotype presented 
choishingwan commented 4 years ago

That’s strange as there should be at least one extra line of output before the no phenotype message.

Anyway, does your phenotype contain the FID and IID? If so, you shouldn’t use ignore-fid and you might want to provide the —pheno-col parameter. If your file has FID and IID, then given your input, PRSice is using the IID as your phenotype, which then lead to the error

On Tue, 19 May 2020 at 3:09 PM, jPontix notifications@github.com wrote:

The phenotype is binary, here the full log file:

PRSice 2.3.0 (2020-05-18) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2020-05-18 19:58:41 ./PRSice_linux \ --A1 A1 \ --A2 A2 \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base /home/atrial_fibrillation/af.cleaning.txt \ --binary-target T \ --clump-kb 250kb \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --cov /home/atrial_fibrillation/Cov_9919orderFam.af.txt \ --ignore-fid \ --interval 5e-05 \ --lower 5e-08 \ --num-auto 22 \ --or \ --out af.help \ --pheno /home/EUR_9919orderFam.af.txt \ --pvalue Pvalue \ --seed 20399376 \ --snp MarkerName \ --stat OR \ --target /home/HRC_imputed/filtered \ --thread 1 \ --upper 0.5

Initializing Genotype file: /home/HRC_imputed/filtered (bed)

Start processing af.cleaning

Base file: /home/atrial_fibrillation/af.cleaning.txt Header of file is:

MarkerName rs_dbSNP147 CHR POS_GRCh37 A1 A2 Freq_A2 Effect_A2 StdErr Pvalue OR

29475783 variant(s) observed in base file, with: 713528 variant(s) located on haploid chromosome 14136 NA stat/p-value observed 5354 ambiguous variant(s) excluded 28742765 total variant(s) included from base file

Loading Genotype info from target

9919 people (0 male(s), 0 female(s)) observed 9919 founder(s) included

3619959 variant(s) not found in previous data 1862828 variant(s) included

Phenotype file: /home/EUR_9919orderFam.af.txt Column Name of Sample ID: IID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Number of variant(s) after clumping : 527406

Processing the 1 th phenotype

No phenotype presented

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-630629023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYS6Y4LBMEC3BYUCCLTRSIWCXANCNFSM4NAOBPFA .

-- Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

giuliapontali commented 4 years ago

I don't know why it is not able to read the phenotype. I also add the --pheno-col AF. Here my pheno file

https://www.dropbox.com/s/fb1g5z9czycy0bs/EUR_9919orderFam.af.txt?dl=0

choishingwan commented 4 years ago

I have got the phenotype file to test, you can now delete it from drop box

On Tue, May 19, 2020 at 4:55 PM jPontix notifications@github.com wrote:

I don't know why it is not able to read the phenotype. I also add the --pheno-col AF. Here my pheno file

https://www.dropbox.com/s/fb1g5z9czycy0bs/EUR_9919orderFam.af.txt?dl=0

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-630684368, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYSZZZ7YNZINM4CARUTRSJCRLANCNFSM4NAOBPFA .

giuliapontali commented 4 years ago

thanks, If it's not enough I will provide to you the base and also the target file

choishingwan commented 4 years ago

Using a simulated target and base file, I've tried the following: When I remove all prefix 0: Processing the 1 th phenotype

AF is a binary phenotype 9919 sample(s) without phenotype None of the target samples were found in the phenotype file. Maybe the first column of your phenotype file is the FID?Or it is possible that only non-founder sample have phenotype information and you did not use --nonfounders? Error: No sample left

When I use the file as a binary trait file Processing the 1 th phenotype

AF is a binary phenotype 9730 control(s) 156 case(s)

When I use the file as a continuous trait:

Processing the 1 th phenotype

AF is a continuous phenotype 0 sample(s) with valid phenotype (there is a display bug here, but the samples are treated as valid)

Without --ignore-fid Error: Not enough column in Phenotype file. If the phenotype does not contain the FID, use --ignore-fid

So I am surprise that you've got

Processing the 1 th phenotype

No phenotype presented

Without the AF is a binary phenotype line.

I have add some error log in this build, could you try this?

https://www.dropbox.com/s/da2itbo7ifph2jg/PRSice_debug?dl=0

giuliapontali commented 4 years ago

The phenotype file is without the FID column (it is the same that I sent to you). Here the output using PRSice_debug (I added --nonfounders)

PRSice 2.3.0 (2020-05-18) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2020-05-19 11:43:09
./PRSice_debug \
    --A1 A1 \
    --A2 A2 \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base /home/atrial_fibrillation/af.cleaning.txt \
    --binary-target T \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov /home/atrial_fibrillation/Cov_9919orderFam.af.txt \
    --ignore-fid  \
    --interval 5e-05 \
    --lower 5e-08 \
    --nonfounders  \
    --num-auto 22 \
    --or  \
    --out af.help \
    --pheno /home/EUR_9919orderFam.af.txt \
    --pheno-col AF \
    --pvalue Pvalue \
    --seed 1955166713 \
    --snp MarkerName \
    --stat OR \
    --target /home/HRC_imputed/filtered \
    --thread 1 \
    --upper 0.5

Initializing Genotype file: 
/home/HRC_imputed/filtered (bed) 

Start processing af.cleaning 
================================================== 

Base file: 
/home/atrial_fibrillation/af.cleaning.txt 
Header of file is: 

MarkerName  rs_dbSNP147 CHR POS_GRCh37  A1  A2  Freq_A2 Effect_A2   StdErr  Pvalue  OR 

Reading 100.00%
29475783 variant(s) observed in base file, with: 
713528 variant(s) located on haploid chromosome 
14136 NA stat/p-value observed 
5354 ambiguous variant(s) excluded 
28742765 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

9919 people (0 male(s), 0 female(s)) observed 
9919 founder(s) included 

3619959 variant(s) not found in previous data 
1862828 variant(s) included 

Phenotype file: /home/EUR_9919orderFam.af.txt 
Column Name of Sample ID: IID 
Note: If the phenotype file does not contain a header, the 
column name will be displayed as the Sample ID which is 
expected. 

There are a total of 1 phenotype to process 

Start performing clumping 

Clumping Progress: 100.00%
Number of variant(s) after clumping : 527406 

Processing the 1 th phenotype 

Reading /home/EUR_9919orderFam.af.txt
DEBUG ID: 0010197321
DEBUG ID: 0010025296
DEBUG: 0    0   0
No phenotype presented 

Error: 
Execution halted
choishingwan commented 4 years ago

Perfect!!! Found the bug. Thank you so much!

I have now publish a 2.3.0a release.

This release should solve the phenotype problem. In addition, I think the updated Rscript should also solve the problem of no unique PRS problem

https://github.com/choishingwan/PRSice/releases/download/2.3.0/PRSice_linux.230a.zip

On Tue, May 19, 2020 at 6:09 PM jPontix notifications@github.com wrote:

The phenotype file is without the FID column (it is the same that I sent to you). Here the output using PRSice_debug (I added --nonfounders)

PRSice 2.3.0 (2020-05-18) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2020-05-19 11:43:09 ./PRSice_debug \ --A1 A1 \ --A2 A2 \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base /home/atrial_fibrillation/af.cleaning.txt \ --binary-target T \ --clump-kb 250kb \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --cov /home/atrial_fibrillation/Cov_9919orderFam.af.txt \ --ignore-fid \ --interval 5e-05 \ --lower 5e-08 \ --nonfounders \ --num-auto 22 \ --or \ --out af.help \ --pheno /home/EUR_9919orderFam.af.txt \ --pheno-col AF \ --pvalue Pvalue \ --seed 1955166713 \ --snp MarkerName \ --stat OR \ --target /home/HRC_imputed/filtered \ --thread 1 \ --upper 0.5

Initializing Genotype file: /home/HRC_imputed/filtered (bed)

Start processing af.cleaning

Base file: /home/atrial_fibrillation/af.cleaning.txt Header of file is:

MarkerName rs_dbSNP147 CHR POS_GRCh37 A1 A2 Freq_A2 Effect_A2 StdErr Pvalue OR

Reading 100.00% 29475783 variant(s) observed in base file, with: 713528 variant(s) located on haploid chromosome 14136 NA stat/p-value observed 5354 ambiguous variant(s) excluded 28742765 total variant(s) included from base file

Loading Genotype info from target

9919 people (0 male(s), 0 female(s)) observed 9919 founder(s) included

3619959 variant(s) not found in previous data 1862828 variant(s) included

Phenotype file: /home/EUR_9919orderFam.af.txt Column Name of Sample ID: IID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00% Number of variant(s) after clumping : 527406

Processing the 1 th phenotype

Reading /home/EUR_9919orderFam.af.txt DEBUG ID: 0010197321 DEBUG ID: 0010025296 DEBUG: 0 0 0 No phenotype presented

Error: Execution halted

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-630723300, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYUP2AO3JMLXHYGEOELRSJLFVANCNFSM4NAOBPFA .

giuliapontali commented 4 years ago

Great it works! Just another question: I would like to generate the quantile plot but if I used the following command I obtain this output:

Rscript PRSice.R --prsice ./PRSice_linux --A1 A1 --A2 A2 --base /home/atrial_fibrillation/af.cleaning.txt --binary-target T --ignore-fid --or --pheno /home/EUR_9919orderFam.af.tab.txt --pheno-col AF --nonfounders --pvalue Pvalue --snp MarkerName --stat OR --target /home/HRC_imputed/filtered --cov /home/atrial_fibrillation/Cov_9919orderFam.af.txt --quantile 10 --plot --out af.help
Begin plotting
Current Rscript version = 2.3.0
Plotting the quantile plot
WARNING: There are only 0 unique PRS but asked for 10 quantiles
Will not generate the quantile plot for  af.help
Plotting Bar Plot
Plotting the high resolution plot
Warning messages:
1: In fread(paste0(prefix, ".best"), data.table = F, colClasses = c(FID = "character",  :
  Detected 4 column names but the data has 3 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.
2: In max(pheno$Pheno) : no non-missing arguments to max; returning -Inf

I don't understand this line:

WARNING: There are only 0 unique PRS but asked for 10 quantiles

Why does it happen?

giuliapontali commented 4 years ago

And also... If I divide the PRS into decile using in R the following line prs$decile <- decile(prs$PRS) I obtained this (in the file I removed the samples IIDs):

https://www.dropbox.com/s/2xzsobk4gn1qnuo/PRS_evaluation.txt?dl=0

The strange thing is that the majority of samples that have AF trait are in the first quantile and only 10 are in the tenth quantile, it seems that the results are swapped. I expected that the majority of samples that present diseases would be in the tenth quantile.

choishingwan commented 4 years ago

The Rscript ? If you search the Rscript for .best, and look for the read table and fread line, do you see colClass? If you can then you are using the latest and you’ve got s different problem

Sam

On Wed, 20 May 2020 at 2:04 AM, jPontix notifications@github.com wrote:

I am using the verion 2.03a

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/195#issuecomment-630987387, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYQEP6NUX43QGC3D7B3RSLC3FANCNFSM4NAOBPFA .

-- Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

giuliapontali commented 4 years ago

The Rscript is the one provided with PRSice 2.03a version. When I go to R and I load the .best file I don't have colClass column.

The output that I have is in the dropbox