getian107 / PRScsx

Cross-population polygenic prediction
MIT License
65 stars 20 forks source link

Optimal choices of tuning parameters for traits with heritability #33

Closed harryyiheyang closed 10 months ago

harryyiheyang commented 1 year ago

Dear Dr. Tian Ge,

Firstly, I would like to express my gratitude for you and your team's work on developing PRSCSX, a powerful Bayesian method. I have encountered some challenges while using this tool for my genetic studies, particularly when dealing with traits that have low heritability.

I noticed that PRSCSX has numerous tuning parameters or hyperparameters. When working with traits that have low heritability, such as gut microbiota GWAS summary data, I find that the number of variants with P-values below 5E-5 is relatively small.

In this context, I have the following questions:

  1. How should the tuning parameters be appropriately set to achieve more accurate results?
  2. Would using the default settings potentially result in too few variants passing through variable selection, thereby affecting the performance of the model?
  3. Do you have any specific recommendations or best practices for dealing with this type of trait?

Thank you!

getian107 commented 1 year ago

Hi - To answer your questions:

  1. The main tuning parameters are the global shrinkage parameter phi and the linear combination of population-specific PRS. You can use a validation dataset to learn these parameters which could maximize prediction accuracy for a specific target. Alternatively if an independent validation dataset is unavailable you can use the auto algorithm and --meta option so you don't need to tune any hyper parameters. We don't recommend tuning other hyper parameters such as a and b in the prior.
  2. PRS-CS/x don't perform variable selection. We use all HapMap3 variants that are available in the GWAS summary statistics and the target dataset to make prediction, regardless of how hyper parameters are set. For well-imputed GWAS and target datasets the number of variants used to build PRS is usually somewhere between 700K to 1.2M.
  3. The predictive power for low-heritability traits would be low because the prediction accuracy is bounded by the heritability. You want to make sure that you use the most powerful GWAS to train the PRS but there is no specific recommendation in terms of setting hyper parameters.