bvilhjal / ldpred

MIT License
96 stars 57 forks source link

gwas summary data format #24

Closed N-damo closed 5 years ago

N-damo commented 5 years ago

Dr Bjarni J. Vilhja ´lmsson: Thank you for your contribution to the PRS research.Recently, l read your paper about the LDpred method.I am very curous about its high accurary in HLA-relatived disease prediction.So I downloaded and installed ldpred.py from https://github.com/bvilhjal/ldpred.When I prepared availabled GWAS summary data,I met a big problem.The summary data format is chr>rsid>a1>a2>bp>beta>pvalue, I am not sure how to prepare such summary data that can be accepted by coord_genotypes.py. I know the parameter SSF_FORMAT accept "STANDARD" (the default), "BASIC", "PGC", and "PGC_large”. Furthermore, should I change beta value to odd ratio like BASIC format, but how? I will very happy if you can give me any suggests. best regards Li’an Lin

bvilhjal commented 5 years ago

Dear Li'an Lin,

Thanks for your inquiry. If you want to train a polygenic risk score using GWAS summary statistics with LDpred, it's unfortunately currently a requirement to have the summary statistics in one of the accepted formats. Regarding the 'beta', I suggest using -log(odds ratio) as the beta. Don't worry about the magnitude or the scale, as the 'p-value' is used to infer the effect estimate. The 'beta' is really only used to infer the direction of the effect.

I hope this helps.

Best regards, Bjarni