bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
628 stars 339 forks source link

SNP-heritability calculation requires consecutive chromosome number? #123

Open transgenomicsosu opened 6 years ago

transgenomicsosu commented 6 years ago

Hi,

I have a genotype data with only part of chromosome covered. I am wondering if there is way to calculate SNP-heritability based on the non-consecutive chromosomes?

Thanks,

rkwalters commented 6 years ago

Hi,

It's probably not going to be a good idea to apply ldsc to that kind of data, but let's start with how you might be able to do it.

By default, ldsc will estimate the heritability per SNP (h2/M) based on your observed data, and then extrapolate that to the number of common SNPs genome-wide (as given by the M_5_50 files in the LD scores) to get the total SNP-h2. The software will let this run successfully even if you are missing large chunks out of the chromosomes (with the possible exception of warnings if you input too few SNPs).

In the case of fully missing chromosomes, you should be able to avoid errors by creating LD scores that are a merged to single genome-wide file rather than split by chromosome. This reference and weight LD files can then be specified using --ref-ld and --w-ld instead of --ref-ld-chr and --w-ld-chr, respectively. (It's possible there's some other minor tinkering involved, but as far as I recall that's the primary barrier.)

In practice, we generally wouldn't recommend this because data with those kinds of gaps is rarely (if ever) a random selection of the genome. If the included regions aren't entirely random, then extrapolating their effect sizes to the rest of the genome (as ldsc will do) probably isn't justified. This is the precise reason we don't recommend the use of ldsc with data from targeted genotyping chips that are likely to prioritize variants enriched for association signals (exome chip, metabochip, immunochip, etc). (Conceivably you could adjust M to only extrapolate h2 per SNP to the number of variants in the regions covered by your data, but I'm not aware of anyone who has tested that approach.)

As long as you have individual-level genotype data, one alternative you might consider is GREML (i.e. GCTA). That will instead allow you to fit a model that will focus on the variance explained by your observed variants without the question of extrapolating to the rest of the genome.

Cheers, Raymond

On Jun 25, 2018, at 4:23 PM, transgenomicsosu notifications@github.com wrote:

Hi,

I have a genotype data with only part of chromosome covered. I am wondering if there is way to calculate SNP-heritability based on the non-consecutive chromosomes?

Thanks,

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/123, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvVNmXatpxjLWWdPNAzmb3sH3h9psks5uAUbegaJpZM4U2y9q.