Hi
I'm using PRSice-2 to run 100s & 1000s of traits from various studies from GWAS Catalog against my target data (in plink ~12,000 samples with ~1,800 cases.)
For many traits like the one attached, the Best fit PRS is achieved at P-threshold
Phenotype Set Threshold PRS.R2 Full.R2 Null.R2 Prevalence Coefficient Standard.Error P Num_SNP Empirical-P
Base 0.0871001 0.0861248 0.134254 0.0526646 - -486.689 20.645 7.08941e-12328988 0.000999001
Is this over fitting? Pvalue for this trait is 7e-123, isn't that way too much (low), even though the empirical -P is 0.0009. Also ~30,000 snps were used.
Out of the 220 traits from this study (Sakaue et al 2021) 74 traits have empirical-P ~ 0.0009
Attached is my log file and plots.
I'm wondering what I'm not understanding about the methodogy and interpretation of PRS.
What would you suggest I do?
Here are top 10 Pthresholds from the .prsice output (2000 thresholds tested)
Pheno Set Threshold R2 P Coefficient Standard.Error Num_SNP
Base 5.005e-05 0.000143812 0.312603 -0.340045 0.336753 29
Base 0.00010005 0.000648264 0.0323814 -0.973314 0.45489 51
Base 0.00015005 0.000787986 0.0182923 -1.3745 0.582504 73
Base 0.00020005 0.000426456 0.0820578 -1.1973 0.688553 99
Base 0.00025005 0.000410892 0.0877659 -1.28292 0.751431 123
Base 0.00030005 0.000565421 0.0452448 -1.69571 0.846849 146
Base 0.00035005 0.000789217 0.0180115 -2.17015 0.917464 168
...
...
Base 0.0999001 0.0800213 8.1373e-121 -517.879 22.1576 32168
Base 0.0999501 0.0799776 9.57176e-121 -517.909 22.1655 32182
Base 0.1 0.0801112 6.39797e-121 -518.784 22.1866 32209
Base 1 0.00222882 6.82245e-05 -342.871 86.0967 165906
Thank You in advance for taking the time for looking at this.
DILIN.Sakaue2021.GCST90018798.noMHC.log
Hi I'm using PRSice-2 to run 100s & 1000s of traits from various studies from GWAS Catalog against my target data (in plink ~12,000 samples with ~1,800 cases.) For many traits like the one attached, the Best fit PRS is achieved at P-threshold
Is this over fitting? Pvalue for this trait is 7e-123, isn't that way too much (low), even though the empirical -P is 0.0009. Also ~30,000 snps were used. Out of the 220 traits from this study (Sakaue et al 2021) 74 traits have empirical-P ~ 0.0009
Attached is my log file and plots.
I'm wondering what I'm not understanding about the methodogy and interpretation of PRS. What would you suggest I do?
Here are top 10 Pthresholds from the .prsice output (2000 thresholds tested) Pheno Set Threshold R2 P Coefficient Standard.Error Num_SNP
Thank You in advance for taking the time for looking at this.