Baseline Model for Partitioning Analysis

sbguarch commented 6 years ago

Hi,

I want to check something- Following the tutorial, to compute the LD score for my own annotations I used the lists of HapMap3 SNPs and the PLINK data from 1000G_EUR_Phase3_plink folder. However, when I ran the partitioning heritability analysis, adding each categories independently, I got an error when using as a baseline the ldscores from the 1000G_EUR_Phase3_baseline folder. I realized that the error was that the ldscores from the baseline model were computed with a different snp list. As I had all the ldscores for all my annotations already done, I just computed again the ldscores for the baseline model using your annotations + 1000G_EUR_Phase3_plink + the HapMap3 snplist.

I don't see many differences between the zscores if I do the other way around (I check for one category), using the snp list from the 1000G_EUR_Phase3_baseline model instead the one from HapMap3.

Is it ok what I did?

rkwalters commented 6 years ago

Hi, My main worry would be if you ended up with significant difference in filtering on allele frequency or high LD regions (e.g. the MHC). But if you’re seeing consistent results either way then it’s probably fine. Checking the correlation between your recomputed baseline LD scores and the provided baseline LD scores might give you additional verification of whether your version has any substantive differences. Cheers, Raymond

On Dec 3, 2017, at 12:04 PM, sbguarch notifications@github.com wrote:

Hi,

I want to check something- Following the tutorial, to compute the LD score for my own annotations I used the lists of HapMap3 SNPs and the PLINK data from 1000G_EUR_Phase3_plink folder. However, when I ran the partitioning heritability analysis, adding each categories independently, I got an error when using as a baseline the ldscores from the 1000G_EUR_Phase3_baseline folder. I realized that the error was that the ldscores from the baseline model were computed with a different snp list. As I had all the ldscores for all my annotations already done, I just computed again the ldscores for the baseline model using your annotations + 1000G_EUR_Phase3_plink + the HapMap3 snplist.

I don't see many differences between the zscores if I do the other way around (I check for one category), using the snp list from the 1000G_EUR_Phase3_baseline model instead the one from HapMap3.

Is it ok what I did?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/96, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvdrrS1uiYSv7kOOsRvSmdamrC9rKks5s8tSygaJpZM4Qzy-G.

yupenghe commented 6 years ago

Hi, I ran into the same issue when I was trying to test enrichment for new annotations. I found that the some SNPs included in the baseline model are not listed as HapMap3 SNPs. If I used HapMap3 SNP list to feed --print-snp, the resulting ldscore.gz files cannot be combined with baseline model for regression. What would you suggest for solving this issue?

One way I can think of is to use the SNP list of baseline model instead of HapMap3 SNPs. However, it is unclear to me how the SNP list in baseline model was defined and why it includes SNPs that are not in HapMap3. Would you be willing to provide more information? Thanks!

Yupeng

giovp commented 6 years ago

Hi, same issue here, the discrepancy is little in my case (< 0,005 % snps on average for each chromosome). Would you suggest to use the approach of @yupenghe ?

Thanks, Giovanni

arushiv commented 6 years ago

I was using the hapMap 3 SNP list from https://data.broadinstitute.org/alkesgroup/LDSCORE/hapmap3_snps.tgz and was getting a different list of SNPs in the LD score files compared to the baseline model LD score files. I then saw that the README.txt in https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_baselineLD_ldscores mentions that the hapMap SNP list is from https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2. I passed the snp ID column of this file in the --print-snp flag and the numbers now match up with the baseline model in https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_baseline_v1.1_ldscores.tgz

Zepeng-Mu commented 4 years ago

I also noticed that there are two sets of HapMap3 lists, and this confuses me a lot.

aina91 commented 4 years ago

@arushiv hi, when replace the snp file by w_hm3.snplist.bz2. I meet this issue :"ValueError: After merging with --print-snps, no SNPs remain." Reading list of 1217312 SNPs for which to print LD Scores from /Ref/w_hm3.snplist Traceback (most recent call last): File "ldsc.py", line 620, in <module> ldscore(args, log) File "ldsc.py", line 344, in ldscore raise ValueError('After merging with --print-snps, no SNPs remain.') ValueError: After merging with --print-snps, no SNPs remain. My code is ./ldsc.py \ --print-snps /Ref/w_hm3.snplist \ --ld-wind-cm 1.0 \ --out /LD_File/1.1 \ --bfile /Ref/1000G_EUR_Phase3_plink/1000G.EUR.QC.1 \ --thin-annot \ --annot /Annot/1.1.annot.gz \ --l2 Could you give me some suggestions,Thx~

haoyang-insitro commented 3 years ago

@aina91 you should use just the "SNP" column (without the header) as @arushiv mentioned

bulik / ldsc

Baseline Model for Partitioning Analysis #96