MRCIEU / opengwas-reports

Report module for IEU GWAS pipeline
1 stars 0 forks source link

Incorporate info from LDSC and clumping #7

Closed explodecomputer closed 5 years ago

explodecomputer commented 5 years ago

there will be three input files

clump.txt is a list of tophits, that looks like this:

rs1421085
rs6567160
rs13021737
rs13130484
rs543874
rs943005
rs11030104
rs7531118
rs7138803
rs10182181
rs3888190
rs1516725
rs11672660
rs13329567
rs3817334
rs2112347
rs7144011
rs13078960
rs2183825

Need to report this in the json file as clumped_hits=<nrow clump.txt>

The LDSC file looks like this:

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.0
* (C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call:
./ldsc.py \
--h2 /data/ldsc.txt.temp \
--ref-ld-chr /ref/eur_w_ld_chr/ \
--out /data/ldsc.txt \
--w-ld-chr /ref/eur_w_ld_chr/

Beginning analysis at Tue Feb  5 23:41:16 2019
Reading summary statistics from /data/ldsc.txt.temp ...
Read summary statistics for 1175121 SNPs.
Reading reference panel LD Score from /ref/eur_w_ld_chr/[1-22] ...
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from /ref/eur_w_ld_chr/[1-22] ...
Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 1172415 SNPs remain.
After merging with regression SNP LD, 1172415 SNPs remain.
Using two-step estimator with cutoff at 30.
Total Observed scale h2: 0.1346 (0.0055)
Lambda GC: 1.1154
Mean Chi^2: 1.3113
Intercept: 0.7043 (0.0069)
Ratio < 0 (usually indicates GC correction).
Analysis finished at Tue Feb  5 23:41:28 2019
Total time elapsed: 11.92s

Print verbatim in the report

For the json need to parse the following:

YiLiu6240 commented 5 years ago

In gwas_processing clump.py is fine but ldsc.py has issues in it and I could not get a completed ldsc.txt.log.

Here is the log:

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.0
* (C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./ldsc.py \
--h2 /data/2/data.bcf \
--ref-ld-chr /ref/eur_w_ld_chr/ \
--out /data/2/ldsc.txt \
--snplist /ref/snplist.gz \
--w-ld-chr /ref/eur_w_ld_chr/ 

Beginning analysis at Sun Feb 10 19:44:50 2019
Reading summary statistics from /data/2/data.bcf ...
and extracting SNPs specified in /ref/snplist.gz ...
Read summary statistics for 1090052 SNPs.
Reading reference panel LD Score from /ref/eur_w_ld_chr/[1-22] ...
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from /ref/eur_w_ld_chr/[1-22] ...
Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 1077077 SNPs remain.
After merging with regression SNP LD, 1077077 SNPs remain.
Traceback (most recent call last):
  File "/ldsc/ldsc.py", line 647, in <module>
    sumstats.estimate_h2(args, log)
  File "/ldsc/ldscore/sumstats.py", line 348, in estimate_h2
    chisq = s(sumstats.Z**2)
  File "/opt/conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/ops.py", line 721, in wrapper
    result = wrap_results(safe_na_op(lvalues, rvalues))
  File "/opt/conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/ops.py", line 692, in safe_na_op
    lambda x: op(x, rvalues))
  File "pandas/_libs/algos_common_helper.pxi", line 1212, in pandas._libs.algos.arrmap_object
  File "/opt/conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/ops.py", line 692, in <lambda>
    lambda x: op(x, rvalues))
TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

Analysis finished at Sun Feb 10 19:45:20 2019
Total time elapsed: 29.48s
explodecomputer commented 5 years ago

Thanks @YiLiu6240 I have updated the repository and it should work now

YiLiu6240 commented 5 years ago

Hi @explodecomputer metrics from ldsc have been added to the report, see on epi-franklin:

/projects/MRC-IEU/research/projects/ieu2/p4/013/workidng/data/results/report-ldsc.html
explodecomputer commented 5 years ago

@YiLiu6240 This is wonderful, really impressed =] In that directory there was no json file but presumably being generated also?

YiLiu6240 commented 5 years ago

@explodecomputer The current file structure for 2 is available at /projects/MRC-IEU/research/projects/ieu2/p4/013/working/data/mrbase-report-module/2 as

data.bcf
metadata.json
ldsc.txt.log
data.bcf.csi
qc_metrics.json
clump.txt
report.html
explodecomputer commented 5 years ago

Great!