You can access comprehensive documentation for using gwas-sumstat-tools at this link: GWAS SumStats Tools Documentation.
There are four commands, read
, format
validate
and gen_meta
(gen_meta
function is currently only accessible to internal GWAS catalog users.)
read
is for:
-h
-M
-m <field name>
format
is for:
[!NOTE] It is memory efficient and will take approx. 30s per 1 million records
gen_meta
is for:
-m
--meta-in <file>
-g
-e
with --<FIELD>=<VALUE>
validate
is for:
$ pip3 install gwas-sumstats-tools
The following Docker command is the equivalent to running gwas-ssf
.
$ docker run -it -v ${PWD}:/application ebispot/gwas-sumstats-tools:latest
Just append any subcommands or arguments e.g.:
$ docker run -it -v ${PWD}:/application ebispot/gwas-sumstats-tools:latest validate
$ gwas-ssf [OPTIONS] COMMAND [ARGS]...
Options:
--help
: Show this message and exit.Commands:
validate
: Validate a sumstats fileformat
: Format a sumstats filegen_meta
: generate meta-yaml fileread
: Read a sumstats filegwas-ssf validate
Validate a sumstats file
Usage:
$ gwas-ssf validate [OPTIONS] FILENAME
Arguments:
FILENAME
: Input sumstats file. Must be TSV (may be gzipped) [required]Options:
-e, --errors-out
: Output erros to a csv file, -z, --p-zero
: Force p-values of zero to be allowable. Takes precedence over inferred value (-i)-m, --min-rows
: Minimum rows acceptable for the file [default: 100000]-i, --infer-from-metadata
: Infer validation options from the metadata file --help
: Show this message and exit.gwas-ssf read
Read (preview) a sumstats file
Usage:
$ gwas-ssf read [OPTIONS] FILENAME
Arguments:
FILENAME
: Input sumstats file [required]Options:
-h, --get-header
: Just return the headers of the file [default: False]--meta-in PATH
: Specify a metadata file to read in, defaulting to -M, --get-all-metadata
: Return all metadata [default: False]-m, --get-metadata TEXT
: Get metadata for the specified fields e.g. `-m genomeAssembly -m isHarmonised--help
: Show this message and exit.gwas-ssf format
Format a sumstats file and creating a new one. Add/edit metadata.
Usage:
$ gwas-ssf format [OPTIONS] FILENAME
Arguments:
FILENAME
: Input sumstats file. Must be TSV or CSV and may be gzipped [required]Options:
-d, --delimiter Text
: Specify the delimiter in the file, if not specified, we can automatically detect the delimiter as whitespace if your file is .txt, comma if your file is .csv, or tab if your file is *.tsv.gz. Otherwise, please specify the delimiter which can help to recognise the column correctly-r, --remove_comments Text
: Remove the lines starts with the given character-g, --generate_config Boolean
: To generate the configuration file for the file needed to be formatted--config_out Path
:Specify the configure JSON output file-o, --ss-out PATH
: Output sumstats file-a, --apply_config Boolean
: Apply the given configuration file to the file-t, -test_config Boolean
: Test the given configuration file to the first 5 rows of the file--config_in Path
: Specify a configure JSON file to read in-f, --analysis_software Text
: Specify the analysis software used for generating the summary statistics data-s, --minimal2standard
: Try to convert a valid, minimally formatted file to the standard format.This assumes the file at least has p_value
combined with rsid in variant_id
field or chromosome
and base_pair_location
. Validity of the new file is not guaranteed because mandatory data could be missing from the original file. [default: False]-b, --batch_apply Boolean
: Apply configuration files to a batch of summary statistics files--lsf Boolean
:Running the batch process via submitting jobs via LSF--slurm Boolean
:Running the batch process via submitting job via Slurmgwas-ssf gen_meta
Generate a meta-yaml file for the existing sumstats file OR edit the existing meta-yaml file.
Usage:
$ gwas-ssf gen_meta [OPTIONS] FILENAME
Example:
# Generate a meta-yaml file from GWAS API (-g) with customised fields (-e --file_type=pre-gwas-ssf) for GCST90278188.tsv files
$ gwas-ssf gen_meta --meta-out GCST90278188.tsv-meta.yaml -g GCST90278188.tsv -e --file_type=pre-gwas-ssf
Arguments:
FILENAME
: Input sumstats file. Must be TSV or CSV and may be gzipped [required]Options:
--meta-out PATH
: Specify the metadata output file-g, --meta-gwas
: Populate metadata from GWAS Catalog [default: False]-e, --meta-edit
: Enable metadata edit mode. Then provide params to edit in the --<FIELD>=<VALUE>
format e.g. --GWASID=GCST123456
to edit/add that value [default: False]--help
: Show this message and exit.This repository uses poetry for dependency and packaging management.
To run the tests:
git clone https://github.com/EBISPOT/gwas-sumstats-tools.git
cd gwas-sumstats-tools
python3 -m venv env
pip install poetry
poetry install
poetry run pytest -s
To make a change:
branch from master -> PR to master -> poetry version -> git add pyproject.toml -> git commit -> git tag
A simple toolkit for reading and formatting GWAS sumstats files from the GWAS Catalog. Built with: