Note: PathOGiST is currently not compatible with OSX
We recommend you create a conda environment for PathOGiST, and install PathOGiST through conda. First set up Bioconda as per the instructions here. PathOGiST requires Python 3.5 or newer:
conda create --name pathogist
And then activate the environment and install PathOGiST:
source activate pathogist
conda install pathogist
When inside the pathogist
conda environment, you can then simply run PATHOGIST -h
, for example.
Note that you will need to install CPLEX separately, as CPLEX is proprietary software.
PATHOGIST run
)This subcommand runs the PathOGiST pipeline from start to finish (i.e. distance matrix creation -> correlation clustering -> consensus clustering).
The main input file is a YAML configuration file, which you can create with the command
PATHOGIST run [path to where you want your config] --new_config
The configuration file will look like this.
Modify the configuration by adding paths to files, changing parameters, etc. You can add your own keys to the YAML configuration file, and delete the default keys which aren't relevant to your experiment.
The inputs to the genotyping
entries should be a file which contains absolute paths to your call files.
For example, mlst_calls.txt
should look something like:
/absolute/path/to/SRR00001.calls
/absolute/path/to/SRR00002.calls
/absolute/path/to/SRR00003.calls
The output of PathOGiST is a TSV file containing the file consensus cluster assignment for each sample.
PATHOGIST correlation
)This subcommand is for clustering bacterial samples based on a distance matrix.
The inputs to correlation clustering are:
You can run correlation clustering with the following command:
PATHOGIST correlation [distance matrix] [threshold] [output path]
PATHOGIST distance
)This subcommand is used for creating distance matrices from genotyping calls, e.g. SNPs, MLSTs, CNVs, etc. Currently, this subcommand is only compatible with SNP calls from Snippy, MLST calls from MentaLiST, and CNV calls from Prince. The input is:
The output is a distance matrix represented as a TSV file.
You can run this subcommand like so:
PATHOGIST distance [path/to/calls_file.tsv] [one of SNP/MLST/CNV] [output path]
PATHOGIST consensus
)The input for consensus clustering is three files:
.tsv
format..tsv
format.The output is a TSV file containing the cluster assignments of the samples which are common to all the input distance matrices.
You can run consensus clustering with the following command:
PATHOGIST consensus [distances] [clusterings] [fine_clusterings] [output path]
Each line of the input files should correspond to a specific data type, e.g. SNPs, MLSTs, or CNVs.
Absolute paths to distance matrices and cluster assignments should be prepended with the name of the clustering and an equal sign, i.e. [name]=[absolute path to file]
.
An example:
Distances file
SNP=/path/to/snp_dist
MLST=/path/to/mlst_dist
CNV=/path/to/cnv_dist
Clusterings file
SNP=/path/to/snp_clust
MLST=/path/to/mlst_clust
CNV=/path/to/cnv_clust
Fine clusterings file
SNP
To cite PathOGiST in publications, please use: