getian107 / PRScsx

Cross-population polygenic prediction
MIT License
65 stars 20 forks source link

summary staticstic SNP information #32

Closed SheaCheng2000 closed 12 months ago

SheaCheng2000 commented 1 year ago

Hi Tian,

Recently I am using EUR GWAS summary data to construct PRS in Asian population, in which I prefer to select the significant SNPs to calculate the score. I'm wondering whether it's acceptable to employ only the information of significant SNPs from summary statistics as the input for the '--sst_file', or if it's necessary to incorporate the entire set of GWAS summary statistics?

Thanks a lot!

getian107 commented 1 year ago

Technically you can use significant SNPs as the input. However, that's NOT the intended use of PRS-CS. The goal of PRS-CS (and all other Bayesian polygenic prediction methods) is to appropriately model LD so you can use full GWAS summary statistics as input without arbitrary SNP selection which can lose information. If you only want to use significant SNPs to build a score, you are essentially using a method called "pruning and thresholding" (P+T) and there is dedicated software for this method (e.g., PRSice). Note that in most scenarios, Bayesian PRS would be more accurate than P+T.

SheaCheng2000 commented 1 year ago

Hi Tian,

Thank you so much for your kind notice! It indeed deepens my understanding of these two methods.)

Actually I have tried PRSice with significant SNPs before. It shows good performance but the only thing that concerns me is the transferability from EUR to EAS (I am not sure if these SNPs derived from EUR GWAS could be directly used on the EAS population). So I turned to PRS-CS/PRS-CSx. Then I found PRS-CSx is more suitable for more than one summary statistics from different ancestries, while I only have one EUR meta summary statistic.

I think in the next step I might try PRS-CS with full GWAS summary statistics. I also found some published methods that improve the transferability, e.g. TL-PRS(https://www.sciencedirect.com/science/article/pii/S000292972200413X#bib9, it also utilizes PRS-CS). But I am not sure if they will work or not. I would greatly appreciate any advice you might have regarding the direction I'm considering.

Thanks again! Shea

getian107 commented 1 year ago

Hi Shea - You are correct that PRS-CSx is designed for GWAS summary statistics from multiple population groups. If you only have EUR sumstats, you could run PRS-CS and test the resulting PRS in EAS or other target populations. In most cases, PRS-CS is expected to outperform P+T although the prediction accuracy in EAS will be lower than that in EUR.

SheaCheng2000 commented 1 year ago

Sure! Thank you!!