Why computing the R2 between the trait and the PRS is different from PRSIce results?

choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores

http://prsice.info

GNU General Public License v3.0

180 stars 85 forks source link

Why computing the R2 between the trait and the PRS is different from PRSIce results? #324

Open Antonio-Nappi opened 1 year ago

Antonio-Nappi commented 1 year ago

I don't understand why, when I compute the R2 value between my trait and the PRS obtained with PRSice the result is very different from the one reported by PRSice, e.g. I obtain a value of -0.47 while PRSice, in its summary file, reports a PRS.R2 of 0.075. I report here also the plots generated by PRSice. The plots looks strange, how should I interpret them?

prsice_results_BARPLOT_2023-06-13 prsice_results_HIGH-RES_PLOT_2023-06-13 prsice_results_QUANTILES_PLOT_2023-06-13

choishingwan commented 1 year ago

There's no way an r2 will be negative as it is the r^2

On Wed, Jun 14, 2023, 3:29 AM Antonio Nappi @.***> wrote:

I don't understand why, when I compute the R2 value between my trait and the PRS obtained with PRSice the result is very different from the one reported by PRSice, e.g. I obtain a value of -0.47 while PRSice, in its summary file, reports a PRS.R2 of 0.075

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/324, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYWKBCTNKDXDPXHUC6DXLFR5RANCNFSM6AAAAAAZF5NZB4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Antonio-Nappi commented 1 year ago

I am using the r2_score function from scikit-learn and according to the documentation Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). In the general case when the true y is non-constant, a constant model that always predicts the average y would get a score of 0.0.. What about the plots that I posted? The high resolution plot is weird, isn't it?

choishingwan commented 1 year ago

all your p-value is 0, which can lead to the high-resolution plot (which operates on -log10 p) to look abnormal. As for the R2, the way we calculated it was 1 - (1 - full model R2) / (1 - null model R2). I am not sure how you implemented your python script and I'd guess there's some error in how you used the r2_score function that might have lead to this drastic difference in results.

Antonio-Nappi commented 1 year ago

Hey, I got the error with the R2, now it's fine. for the pvalue that are all 0, do you think that it's an error? The trait that I am studying is BMI

choishingwan commented 1 year ago

It is possible for BMI to have p-value of 0 as the GWAS is powerful, though I will also try and look out for sample overlap, which can sometimes lead to inflation in test statistics.