Computing my own reference panel + other questions

privefl commented 3 years ago

In one of my papers, I have defined several ancestry groups in the UK Biobank: table1

I would like to run PRS-CSx on the training 2, i.e. using 7 different ancestry groups.

Is running PRS-CSx possible/suitable for this particular use case?
How could I compute my own reference panel for all these groups? Do you have a script to do this from e.g. bed files?

getian107 commented 3 years ago

I think it's technically possible but so far our evaluation of the method has been focused on super-population levels. I'm not sure dividing continental populations into smaller subgroups would help or harm prediction, as many of the GWAS may be underpowered and the differences in effect size and LD patterns between some subpopulations may be limited. Building the reference panel for PRS-CSx is similar to building a panel for PRS-CS (https://github.com/getian107/PRScs/issues/20) although additional work is needed to link reference panels across populations. I don't have an automatic pipeline to do this yet but would be happy to share some scripts if you are interested.

privefl commented 3 years ago

Should I group UK + Italy + Poland under EUR, remove Iran and use the LD ref panels from the UKBB that you provide then?

getian107 commented 3 years ago

Yes- I think that might be a good starting point.

getian107 / PRScsx

Computing my own reference panel + other questions #8