choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
187 stars 90 forks source link

Is a summary statistic file (--base ) indispensable for PRS scoring in PRSice-2 ? #309

Open shudanhua opened 2 years ago

shudanhua commented 2 years ago

Dear developers,

I'm trying to use PRSice-2 to build a PRS. As I can see from the tutorial, [step1] (--base) summary statistic file and (--target) plink file are used to generate the best-fit PRS module. [step2] If we are going to calculate the PRS for a phenotype unknown individual , then (--base --target --no-regression --fastscore --bar-levels [generated from step1] ) should be used here.

I'm wondering if step1 can generate a weighted score for each SNP which were included in the PRS module, and then replace the summary statistic file in step2. For exapmle, like the table below(output from Adaptive MultiBLUP): Predictor A1 A2 Centre Region1 Region2 Region3 Region4 Region5 Region6 rs10737396 T C 0 0 0 0 0 0 0 rs6674378 G T 0 0 0 0 0 0 0 rs17374719 G A 0 0 0 0 0 0 0

Or should (--base summary statistic) alway be present in each step?

An additional question, I have a cohort with around 5000 cases and 5000 control, and try to build a PRS based on it. Should I split the cohort into two part, like 50%,50%, then use one part to generate a summary statistic (for --base ) and the other as --target ?

Many Thanks! Danhua

choishingwan commented 2 years ago

Yes. Without the effect size estimation, we cannot generate a PRS.

Try GWAS catalog or PGS catalog and see if you can find the appropriate effect size for PRS construction. You dataset is relatively small, splitting in halve likely does not give you enough power for PRS analyses