Copyright holder: EMBL-European Bioinformatics Institute (Apache 2 License)
This script is designed to automatically finemap and highlight the causal variants behind GWAS results by cross-examining GWAS, population genetic, epigenetic and cis-regulatory datasets.
Its original design was based on STOPGAP. It takes as input a disease identifier, extracts associated SNPs via GWAS databases, expands them by LD, then searches an array of regulatory and cis-regulatory databases for gene associations.
If you wish to shortcut all of the instructions below, you can simply use our VirtualBox virtual machine.
Add the lib/
directory to your $PYTHONPATH
environment variable.
The scripts/installation/ubuntu_environment.sh
describes a recipe to install all basic C and Python dependencies on a fresh ubuntu server (requires root access).
To install all binformatic dependencies run sh scripts/installation/install_dependencies.sh
.
Add the ./bin/
directory to your $PATH
environment variable.
The following script downloads a bunch of files into $PWD
:
sh scripts/installation/download.sh
Ideally, save these files in a separate directory, which we will call databases_dir
.
The following will create a databases_dir
directory for you:
cd scripts/build_data_files
make download
make process
Warning this may take days as it needs to split the entire 1000 Genomes files by population.
Every time you run POSTGAP, add --database_dir /path/to/databases_dir
to the command line, where the database directory path corresponds to the directory created above.
By default, run from the root directory the command:
python POSTGAP.py --disease autism --population EUR
Multiple disease names can be provided.
You can also provide a list of EFOs:
python POSTGAP.py --efos EFO_0000196
Or an rsID:
python POSTGAP.py --rsID rs10009124
Or a manually defined variant:
python POSTGAP.py --coords my_variant 1 1234567
To short cut the GWAS databases and enter you own data with a file:
python POSTGAP.py --summary_stats tests/sample_data/example.tsv
The summary statistics file should be tab delimited and follow the GWAS Catalog recommentations.
In particular, it must have the following columns:
For an EFO, you can trigger the Bayesian calculations with:
python POSTGAP.py --efos EFO_0000196 --bayesian --output2 output2.txt
In this case, POSTGAP produces a tab-delimited output file, 'output2.txt'. The columns represent:
It can be displayed as:
python scripts/present_results/postgap_html_report.py --result_file output2.txt --template scripts/present_results/geneReport.html --output report.html
By default, the script writes out a tab delimited file to standard out.
If you wish, you can redirect this into a file:
python POSTGAP.py --disease autism --output results.txt
If you want a JSON dump of all the data retrieved by the pipeline:
python POSTGAP.py --disease autism --output results.json --json
python POSTGAP.py --disease autism --json
You can check the output with the following commands using the data tests.
Check out our Wiki