bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
652 stars 344 forks source link

No results file from partitioned heritability #254

Open zrcjessica opened 3 years ago

zrcjessica commented 3 years ago

I am trying to run ldsc.py to measure enrichment of GWAS risk variants in open chromatin regions of interest. I have generated the thinned *.annot.gz files for these regions and computed the LD scores as outlined in the wiki.

My command:

python ldsc.py \
--h2 {GWAS}.sumstats.gz \
--ref-ld-chr {MY.ANNOTATIONS}. \ 
--w-ld-chr weights_hm3_no_hla/weights. \
--out {OUTPUT} \
--print-coefficients

It runs without any error messages (except for a FutureWarning described in issue #224 ). However, I only get a .log output file and no .results file. This is what the .log file looks like:

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./ldsc.py \
--h2 /iblm/netapp/home/jezhou/rat_snatacseq_analysis/partition_h2/data/liu_alcohol_tobacco/ldsc_sumstats/DrinksPerWeek.sumstats.gz \
--ref-ld-chr /iblm/netapp/home/jezhou/rat_snatacseq_analysis/data/snATAC/ldsc_annot/Honeybadger2/Honeybadger2. \
--out Rat_Amygdala_cocaine_high-DrinksPerWeek \
--thin-annot  \
--w-ld-chr files/weights_hm3_no_hla/weights. \
--print-coefficients  

Beginning analysis at Tue Dec 15 10:28:58 2020
Reading summary statistics from /iblm/netapp/home/jezhou/rat_snatacseq_analysis/partition_h2/data/liu_alcohol_tobacco/ldsc_sumstats/DrinksPerWeek.sumstats.gz ...
Read summary statistics for 6611470 SNPs.
Reading reference panel LD Score from /iblm/netapp/home/jezhou/rat_snatacseq_analysis/data/snATAC/ldsc_annot/Honeybadger2/Honeybadger2.[1-22] ... (ldscore_fromlist)
Read reference panel LD Scores for 1161309 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from files/weights_hm3_no_hla/weights.[1-22] ... (ldscore_fromlist)
Read regression weight LD Scores for 1242190 SNPs.
After merging with reference panel LD, 1060200 SNPs remain.
After merging with regression SNP LD, 1049543 SNPs remain.
Using two-step estimator with cutoff at 30.
Total Observed scale h2: 0.0267 (0.0018)
Lambda GC: 1.3169
Mean Chi^2: 1.4489
Intercept: 1.0577 (0.0099)
Ratio: 0.1285 (0.0221)
Analysis finished at Tue Dec 15 10:30:23 2020
Total time elapsed: 1.0m:24.14s

I also tried adding the baseline files to my analysis by modifying the --ref-ld-chr flag to:

--ref-ld-chr {MY.ANNOTATIONS}.,baseline.

However, I get the same issue reported in issue #233 .

My questions are:

  1. Do I need to run my custom annotations alongside the baseline, or is that unnecessary?
  2. If it is not necessary to run my custom annotations alongside the baseline, why am I not getting a .results file?
  3. If it is necessary to run my custom annotations alongside the baseline, then is there a less tedious workaround than regenerating the baseline files myself (solution reported in #233 )?
zrcjessica commented 3 years ago

When I looked at the source code, I see that the .results is only returned when ldsc.py is run with --overlap-annot. I also realized that all of my input files need to corresponding to 1000G Phase 3. However, why is it that --overlap-annot is required to generate a .results file even when I am only giving one set of annotations to --ref-ld-chr?

badoi commented 3 years ago

I've only been able to get the --print-snps argument to work with partitioned heritability and don't get a .results file like all everyone mentioned above. It took some wrangling, but the log files has most of the columns of the .results file that would otherwise be made with the --overlap-annot file. Here's a block of bash code to post-process the .log file into most of the columns of the .results file.

    awk -F '\t' '/Total Observed scale h2*/{flag=1;next}/Lambda GC/{flag=0}flag' my.log | \
    # append line with leading white space to previous
    sed ':a;N;$!ba;s/\n //g'| \
    # parse Partitioned heritability rownames, starts w/ colon
    awk -F":" -v OFS='\t' '{gsub("[[:space:]]", "_", $1);gsub(":", "", $0); print}' | \
    # transpose row to columns
    awk -v OFS='\t' '
    { 
        for (i=1; i<=NF; i++)  {
            a[NR,i] = $i
        }
    }
    NF>p { p = NF }
    END {    
        for(j=1; j<=p; j++) {
            str=a[1,j]
            for(i=2; i<=NR; i++){
                str=str"\t"a[i,j];
            }
            print str
        }
    }' | awk -v OFS='\t' '{gsub("_[0-9]+$", "", $1); gsub("L2$", "", $1); print}' | \
    gzip > my.results.gz