bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
635 stars 342 forks source link

Using Meta-analysis Results Containing MetaChip for LDSC #48

Closed longmanz closed 8 years ago

longmanz commented 8 years ago

Hello,

According to the paper of LDSC, MetaboChip data is not applicable to the LDSC (at least 1M SNPs are needed for LDSC I think?).

Currently we have two meta-analysis datasets for LDSC correlation analysis, stage 1 (all GWAS, ~2M SNPs) and combined (GWAS, ~2M SNPs and MetaboChip, ~0.3M SNPs).

I am thinking using the stage 1 only but the N is merely 10k, and the result has a relatively large SE. Using combined one provides a seemingly better result, yet I am not sure if this is appropriate.

Best wishes, Longda

rkwalters commented 8 years ago

Hi Longda, The requirement is probably somewhat lower than 1M SNPs (ldsc doesn’t warn about the number of SNPs until 200k), but MetaboChip has the additional challenge of being a non-random set of SNPs that expected to be strongly enriched for effects (vs. a standard GWAS backbone). There’s been a lot of discussion of this general issue on the ldsc_users group as it relates to both MetaboChip and ExomeChip data, and I’d strongly encourage you to read some of those threads for additional background, but the consensus seems to be:

1) Univariate h2 estimates for metabochip data should be ignored entirely since they require unreasonable extrapolation from the selected SNPs to the rest of the genome. 2) For univariate analysis, the ratio statistic is still valid for evaluating the relative contribution of polygenic effects and population stratification to inflated lambda values. 3) Genetic correlation analysis using metabochip data is still valid.

This is less rigorous proof and more rule-of-thumb, but at least is the current line of thinking for ExomeChip/MetaboChip data. And for what it’s worth, N=10k should be sufficient for basic ldsc analyses so it may be worth favoring the Stage 1 results of there are giving substantially different estimates than your combined data (i.e. not just larger SEs).

Cheers, Raymond


Raymond K. Walters Postdoctoral Research Fellow Analytic & Translational Genetics Unit Massachusetts General Hospital rwalters@broadinsitute.org

On Mar 18, 2016, at 11:16 AM, Longda notifications@github.com wrote:

Hello,

According to the paper of LDSC, MetaboChip data is not applicable to the LDSC (at least 1M SNPs are needed for LDSC I think?).

Currently we have two meta-analysis datasets for LDSC correlation analysis, stage 1 (all GWAS, ~2M SNPs) and combined (GWAS, ~2M SNPs and MetaboChip, ~0.3M SNPs).

I am thinking using the stage 1 only but the N is merely 10k, and the result has a relatively large SE. Using combined one provides a seemingly better result, yet I am not sure if this is appropriate.

Best wishes, Longda

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

longmanz commented 8 years ago

Dear Raymond,

Thank you so much for the prompt reply! Yes I understand that now. Since there is no significant difference between the estimated correlation coefficient I think I will keep using the stage 1 dataset.

Also I will definitely look into those threads. Thank you for your reminder.

Best wishes, Longda

rkwalters commented 8 years ago

Closing this thread as answered since there's been no additional activity in the past month. Follow up questions are always welcome in the ldsc users group: https://groups.google.com/forum/#!forum/ldsc_users

Cheers, Raymond