getian107 / PRScsx

Cross-population polygenic prediction
MIT License
65 stars 20 forks source link

Re: Output #14

Closed ttumkaya closed 2 years ago

ttumkaya commented 2 years ago

Hi Tian,

Thank you for this amazing tool!

I'm using PRScsx to calculate scores from EUR and AFR populations, and have a quick question. The output files here are separate for two populations, unless --meta flag is set TRUE. Are this meta-analyzed results correspond to the "Final PRS" in Figure 1 in the pre-print?

Best, Tayfun

getian107 commented 2 years ago

Hi Tayfun- The final PRS in Figure 1 corresponds to a linear combination of the population-specific PRS learnt in the validation dataset. The --meta flag provides a specific way to combine population-specific PRS (i.e., by meta-analyzing the posterior SNP weights), which is usually less accurate than learning the linear combination in the target dataset but can be useful in certain scenarios (e.g., when the sample size of the target dataset is small).

ttumkaya commented 2 years ago

I see thanks for the clarification, Tian. So, --meta output is basically inverse-variance-weighted meta-analysis.

PRScsx is giving me SNP weights for the two populations separately (EUR and AFR), but not the integrated version. Is it supposed to provide the linear combination of the population-specific PRS as well, similar to what you used in the paper?

getian107 commented 2 years ago

Yes, --meta is an inverse-variance-weighted meta-analysis of the posterior SNP weights.

The linear combination is not part of the software because the combination is learnt for individual-level PRS while the software works on summary statistics only. You need to take the population-specific SNP weights output from PRS-CSx, calculate population-specific PRS in your individual-level target dataset, and then fit a linear regression like y ~ covariates + PRS_EUR + PRS_AFR to learn the coefficients of the linear combination.

ttumkaya commented 2 years ago

Oh, I see, then I suppose the (B1*PRS_EUR + B2*PRS_AFR)would be the final PRS in the preprint.

One last thing, what do you consider as a small target dataset for this calculation? My dataset has N=38, I think I might be better off with the META SNP weights.

getian107 commented 2 years ago

Yes- the linear combination of population-specific PRS is the final PRS in the preprint.

N=38 sounds quite small for the validation/testing. Usually a few hundred samples (or cases for binary phenotypes) are needed to get stable estimates of performance metrics. In your case in think you can try the meta option so you don't need to split the target dataset further.

ttumkaya commented 2 years ago

Thanks a lot for the prompt and very helpful responses again, Tian!