bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
626 stars 339 forks source link

Sample overlap #71

Closed zhaozhongzhu closed 7 years ago

zhaozhongzhu commented 7 years ago

Hello On the LDSC website, it mentioned for estimating rg, I should use both the --intercept-h2 and --intercept-gencov flags to constrain the single-trait and cross-trait LD Score regression intercepts, respectively.

So my first question is what value I should constraint for --intercept-h2?

My second question is how to estimate phenotypic correlation p for calculating ρNs/sqrt(N1N2)?

./ldsc.py \ --rg as.sumstats.gz,al.sumstats.gz,br.sumstats.gz \ --ref-ld-chr baseline_ld_chr/base.${chr}@ \ --w-ld-chr baseline_ld_chr/base.${chr}@ \ --intercept-h2 ???? \ --intercept-gencov 0,???? \ --out as_res

Thank you!

rkwalters commented 7 years ago

Hello, The intercepts should only be constrained if you have a very strong prior reason to believe they have the specified value. Generally we recommend leaving both the h2 and the gencov intercept unconstrained. The section on the website describing constrained intercepts is there to document how it can be done, not to suggest that it necessarily should be done.

If constraining --intercept-h2, the null value is 1 (i.e. if zero confounding from population stratification or other [potentially unknown] factors is present in the GWAS).

The phenotypic correlation for --intercept-gencov should ideally be determined empirically, e.g. the observed correlation of the phenotypes in your data.

Cheers, Raymond

On Feb 22, 2017, at 9:59 AM, zhaozhongzhu notifications@github.com wrote:

Hello On the LDSC website, it mentioned for estimating rg, I should use both the --intercept-h2 and --intercept-gencov flags to constrain the single-trait and cross-trait LD Score regression intercepts, respectively.

So my first question is what value I should constraint for --intercept-h2?

My second question is how to estimate phenotypic correlation p for calculating ρNs/sqrt(N1N2)?

./ldsc.py --rg as.sumstats.gz,al.sumstats.gz,br.sumstats.gz --ref-ld-chr baseline_ld_chr/base.${chr}@ --w-ld-chr baseline_ld_chr/base.${chr}@ --intercept-h2 ???? --intercept-gencov 0,???? --out as_res

Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/71, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvUZrOsrScOSLHGbBhAJ0rubuMV_Oks5rfE1CgaJpZM4MIv-N.

zhaozhongzhu commented 7 years ago

Hi Raymond Thank you for the response! In my situation, I have a lot of overlap samples between traits. This is why I consider constraint the intercept based on the description in LDSC Nature Genetics paper.

  1. For estimating phenotypic correlation, do you have recommendations on what method to use, such as pearson, etc...
  2. for constraining --intercept-h2, if there is sample overlap, what value should I put in here?

./ldsc.py --rg as.sumstats.gz,al.sumstats.gz,br.sumstats.gz --ref-ld-chr baseline_ld_chr/base.${chr}@ --w-ld-chr baseline_ld_chr/base.${chr}@ --intercept-h2 ???? --intercept-gencov 0,???? --out as_res

rkwalters commented 7 years ago

Hi,

1) Pearson correlation is appropriate for quantitative traits. If you have binary traits, there is an alternative expression for the expected intercept based on the counts from a 2x2 table of the phenotypes (Proposition 2 in the supplement of the genetic correlation paper, though note the caveats in the preceding paragraph regarding assumptions about the ascertainment scheme).

2) --intercept-h2 is specific to each trait, so shouldn’t be affected by sample overlap.

I should also note that the presence of sample overlap doesn’t mean you have to constrain the intercept. You can leave it unconstrained and ldsc will freely estimate the intercept. As noted in the paper, the estimate of genetic correlation is unbiased even if you don’t constrain the intercept; constraining the intercept only buys you a decrease in SE, at the cost of making much stronger assumptions.

Cheers, Raymond

On Feb 22, 2017, at 1:58 PM, zhaozhongzhu notifications@github.com wrote:

Hi Raymond Thank you for the response! In my situation, I have a lot of overlap samples between traits. This is why I consider constraint the intercept based on the description in LDSC Nature Genetics paper.

For estimating phenotypic correlation, do you have recommendations on what method to use, such as pearson, etc... for constraining --intercept-h2, if there is sample overlap, what value should I put in here? ./ldsc.py --rg as.sumstats.gz,al.sumstats.gz,br.sumstats.gz --ref-ld-chr baseline_ld_chr/base.${chr}@ --w-ld-chr baseline_ld_chr/base.${chr}@ --intercept-h2 ???? --intercept-gencov 0,???? --out as_res

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/71#issuecomment-281766314, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvXWknnSn52pHbgLoSptxxCtXnfNlks5rfIU9gaJpZM4MIv-N.

zhaozhongzhu commented 7 years ago

Hi Raymond Thank you for the great explanation! When you say "leave it unconstrained and ldsc will freely estimate the intercept.", do you mean ldsc can use the intercept that it estimates by itself and then automatically adjust to the Rg calculation even though I don't choose --constraint intercept option? I also noticed there is an option called "--no intercept", is this different from not putting any intercept command in the code?

Best ZZ

rkwalters commented 7 years ago

Hi, Yes, that’s correct, ldsc will estimate the intercept and adjust the rg accordingly if you don’t specify the intercept.

This is not the same as the --no-intercept option; instead, that flag is equivalent to set --intercept-h2 to 1 and --intercept-gencov to 0 for all phenotypes.

Cheers, Raymond

On Feb 22, 2017, at 6:29 PM, zhaozhongzhu notifications@github.com wrote:

Hi Raymond Thank you for the great explanation! When you say "leave it unconstrained and ldsc will freely estimate the intercept.", do you mean ldsc can use the intercept that it estimates by itself and then automatically adjust to the Rg calculation even though I don't choose --constraint intercept option? I also noticed there is an option called "--no intercept", is this different from not putting any intercept command in the code?

Best ZZ

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/71#issuecomment-281839645, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvbqtcKdqPXxRu-xuZCDzArYwpu2xks5rfMTBgaJpZM4MIv-N.

zhaozhongzhu commented 7 years ago

Thank you Raymond!

rkwalters commented 7 years ago

Hi, I'm closing this issue thread as resolved, but if you have any more issues feel free to follow up here or via the google group. Cheers, Raymond