ghm17 / LOGODetect

LOGODetect is a powerful tool to identify small segments that harbor local genetic correlation between two traits/diseases.
GNU General Public License v3.0
19 stars 5 forks source link
genetic-correlations

LOGODetect

LOGODetect (LOcal Genetic cOrrelation Dectector) is a powerful tool to identify small segments that harbor local genetic correlation between two traits. We have now updated the software, which can identify small regions with significant local genetic correlation across two populations.

Before starting

Single-population cross-trait analysis

LOGODetect requires the reference genotype data and the pre-computed LD score. Here are the command line to download these reference:

wget ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/LOGODetect/LOGODetect_data.tar.gz
tar -zxvf LOGODetect_data.tar.gz

Applying LOGODetect


conda activate ldsc

Rscript /LOGODetect.R \
--sumstats PATH_TO_SUMSTAT1,PATH_TO_SUMSTAT2 \
--n_gwas N1,N2 \
--ref_dir PATH_TO_REFERENCE \
--pop POPULATION \
--ldsc_dir PATH_TO_LDSC \
--block_partition PATH_TO_GENOME_PARTITION \
--out_dir PATH_TO_OUTFILE \
# The following flags are optional.
--max_nsnps CN \
--interval INTER \
--chr CHR \
--n_cores N_CORE

conda deactivate

where the inputs in order are

A concrete example

#!/bin/bash
## ---- Download the required reference panel and example data for LOGODetect ---- ##
wget ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/LOGODetect/LOGODetect_data.tar.gz
tar -zxvf LOGODetect_data.tar.gz
rm -rf LOGODetect_data.tar.gz

## ---- Applying LOGODetect ---- ##
cd LOGODetect_data

mkdir ./results

conda activate ldsc

Rscript /LOGODetect/LOGODetect.R \
--sumstats ./sumstats/BIP.txt,./sumstats/SCZ.txt \
--n_gwas 51710,105318 \
--ref_dir ./LOGODetect_1kg_ref \
--pop EUR \
--ldsc_dir /LOGODetect/ldsc \
--block_partition /LOGODetect/block_partition.txt \
--out_dir ./results \
--n_cores 25

conda deactivate

Output

LOGODetect outputs a whitespace-delimited text file LOGODetect_regions.txt in PATH_TO_OUTFILE specified by the user, with each row representing one small segment and the columns as such:

We have prepared the example output file for you in /LOGODetect_data/results/LOGODetect_regions.txt.

Cross-population analysis

Applying LOGODetect

Rscript LOGODetect.R \
--sumstats PATH_TO_SUMSTAT1,PATH_TO_SUMSTAT2 \
--n_gwas N1,N2 \
--ref_dir PATH_TO_REFERENCE \
--pop POPULATION1,POPULATION2 \
--block_partition PATH_TO_GENOME_PARTITION \
--gc_snp PATH_TO_SNPLIST \
--out_dir PATH_TO_OUTFILE \
# The following flags are optional.
--n_cores N_CORE

where the inputs in order are

A concrete example

#!/bin/bash
## ---- Download the example data ---- ##
wget ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/XWING/example/X-Wing_example.tar.gz
tar -zxvf X-Wing_example.tar.gz
rm -rf X-Wing_example.tar.gz

## ---- Download the required reference panel ---- ##
cd example/data

### EUR reference panel
wget ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/XWING/ref/LOGODetect/LOGODetect_1kg_EUR.tar.gz
tar -zxvf LOGODetect_1kg_EUR.tar.gz
rm -rf LOGODetect_1kg_EUR.tar.gz

### EAS reference panel
wget ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/XWING/ref/LOGODetect/LOGODetect_1kg_EAS.tar.gz
tar -zxvf LOGODetect_1kg_EAS.tar.gz
rm -rf LOGODetect_1kg_EAS.tar.gz

## ---- Applying LOGODetect ---- ##
cd example

mkdir ./results/LOGODetect

Rscript /LOGODetect/LOGODetect.R \
--sumstats ./data/sumstats/BMI_EUR.txt,./data/sumstats/BMI_EAS.txt \
--n_gwas 359983,158284 \
--ref_dir ./data/LOGODetect_1kg_ref \
--pop EUR,EAS \
--block_partition /LOGODetect/block_partition.txt \
--gc_snp /LOGODetect/1kg_hm3_snp.txt \
--out_dir ./results/LOGODetect \
--n_cores 20

Output

Citation

If you use the software of LOGODetect, please cite:

Guo, H., Li, J. J., Lu, Q., Hou, L. Detecting Local Genetic Correlations with Scan Statistics. Nature Communications, 2021.

Miao, J., Guo, H., Song, G., Zhao, Z., Hou, L., & Lu, Q. (2023). Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics. Nature Communications, 2023.

The genetic covariance estimation is adapted from ldsc, see Bulik-Sullivan, B., et al. An Atlas of Genetic Correlations across Human Diseases and Traits. Nature Genetics, 2015.

The LD blocks partition is adapted from LDetect, see Berisa, Tomaz, and Joseph K. Pickrell. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics (2016).