choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
187 stars 90 forks source link

Is this overfitting? PRScise-2 #294

Open samreenzafer opened 2 years ago

samreenzafer commented 2 years ago

DILIN Sakaue2021 GCST90018798 noMHC_QUANTILES_PLOT_2022-04-22 DILIN Sakaue2021 GCST90018798 noMHC_HIGH-RES_PLOT_2022-04-22 DILIN Sakaue2021 GCST90018798 noMHC_BARPLOT_2022-04-22 DILIN.Sakaue2021.GCST90018798.noMHC.log

Hi I'm using PRSice-2 to run 100s & 1000s of traits from various studies from GWAS Catalog against my target data (in plink ~12,000 samples with ~1,800 cases.) For many traits like the one attached, the Best fit PRS is achieved at P-threshold

Phenotype Set Threshold PRS.R2 Full.R2 Null.R2 Prevalence Coefficient Standard.Error P Num_SNP Empirical-P

  • Base 0.0871001 0.0861248 0.134254 0.0526646 - -486.689 20.645 7.08941e-123 28988 0.000999001

Is this over fitting? Pvalue for this trait is 7e-123, isn't that way too much (low), even though the empirical -P is 0.0009. Also ~30,000 snps were used. Out of the 220 traits from this study (Sakaue et al 2021) 74 traits have empirical-P ~ 0.0009

Attached is my log file and plots.

I'm wondering what I'm not understanding about the methodogy and interpretation of PRS. What would you suggest I do?

Here are top 10 Pthresholds from the .prsice output (2000 thresholds tested) Pheno Set Threshold R2 P Coefficient Standard.Error Num_SNP

Thank You in advance for taking the time for looking at this.

choishingwan commented 2 years ago

Difficult to tell without the full context, but does not seems like there is anything out of the ordinary.