EBISPOT / gwas-utils

Collection of tools and scripts to manage GWAS Catalog related infrastructure.
https://www.ebi.ac.uk/gwas
6 stars 2 forks source link
data-release quality-control toolbox

gwas-utils

This repository is a collection for scripts and small applications we are using in the everyday life of the GWAS Catalog.

For detailed description of the content of this repository see the individual readme files within each folder or the documentation on Confluence.

Installation/execution

First thing to note is that many of the utils have a hard dependency on the curation database. This make the portability of those utils troublesome and they cannot be run off the network (i.e. locally).

With Docker

docker run -it ebispot/gwas-utils <entry_point> [options]

e.g.

docker run -it ebispot/gwas-utils python /catalogPlots/gwas_cat_plus_ss.py

With conda

git clone git@github.com:EBISPOT/gwas-utils.git
cd gwas-utils
conda env create -f conda_env.yml
conda activate gwas-utils
pip install .

With virtualenv

git clone git@github.com:EBISPOT/gwas-utils.git
cd gwas-utils
python3 -m venv .venv
source .venv/bin/activate
pip install .

User/system wide

git clone git@github.com:EBISPOT/gwas-utils.git
cd gwas-utils
pip install .

Contents

After installation (above) the tools below will be available. Usage, entry points and further documentation for each utility is given on the following links:

Plotter scripts

A collection of scripts we use to generate plots, stats of the GWAS Catalog.

Curation utils

Historic curator scripts (merged in from https://github.com/EBISPOT/gwas-curation-utils)

Curator user manager

A tool to add, change, remove curator user in the database.

Data release QC tool

Tool to compare databases and solr as part of the quality control process. This script is called during the data release process.

Data export tool

A script to perform the data export task of the data release plan. Generates all downloadable files, names them properly, then generates release specific readme for the ftp folder.

Diagram creator for data release

A tool to solve issues with diagram generation: when the pussycat application is called, this script keeps checking the process and the generation of the diagram. Also performs certain checks. This script is also part of the data release process.

EPMC XML tools

EPMC API querying tool

Summary statistics folder manager

Tool to release summary stats folders to ftp. This script is called during the data release process.

GWAS association filter

Tool for application flagging peak associations in a distance based fashion (merged in from https://github.com/EBISPOT/gwas-associationFilter)

Access log analysis

Scripts to analyse site access logs to generate statistics on user behaviour.

Remapper manager

Upon every new release of Ensembl the full GWAS Catalog data has to be remapped to the new release. This tool to help the remapping process by automating the process that triggers remapping.

Search term classifier

To generate site access stats it is useful to know what users are sarching for. This script classifies search terms parsed out from site access statistics.

Solr wrapper

This small Python module makes it easy to query, update, refresh the specified solr instance/core.

FTP Summary Stats Script

Scripts to control summary statistics file release to the FTP

Harmonisation Utils

Scripts to control data flow from submission app to harmonisation pipeline