STRONG resolves strains on assembly graphs by resolving variants on core COGs using co-occurrence across multiple samples.
Installation
Quick Start
Usage
Config File
Detailed Pipeline
Synthetic community data
The following pieces of software should be installed on your machine before attempting to install STRONG
For a standard Ubuntu 16.04 distribution. The above packages would be installed as:
sudo apt-get update
sudo apt-get -y install libbz2-dev libreadline-dev cmake g++ zlib1g zlib1g-dev
We then need to install miniconda we recommend the Python 3.8 version. To install miniconda follow the instructions here. Remember that conda activation may require logging back in again.
STRONG can be installed anywhere but for the below we assume it will be placed in a location SPATH that you set as an environment variable:
export SPATH=/mypath/to/repos
cd $SPATH
We begin by cloning STRONG recursively:
git clone --recurse-submodules https://github.com/chrisquince/STRONG.git
STRONG contains DESMAN and BayesPaths as submodules.
If you need to update in future:
cd STRONG
git submodule foreach git pull origin master
All the steps described below have been compiled for convenience in the install_STRONG.sh script. It is mostly silent and all logs are found in install.log. This script does not however install any databases. So please refer to corresponding section for those : Database needed (COG)
Inside the STRONG directory, type the following command:
./install_STRONG.sh
We recommend that you first compile the SPAdes and COG tools executables outside of conda:
cd ./STRONG/SPAdes/assembler
./build_cog_tools.sh
The full list of requirements is listed in the file conda_env.yaml we recommend mamba for install. This can be itself installed through conda by:
conda install -c conda-forge mamba
Then we use mamba to resolve the STRONG environment from within the STRONG home directory:
cd $SPATH/STRONG
mamba env create -f conda_env.yaml
This should take 5 - 10 minutes with mamba.
Once the STRONG environment has been installed activate it with the following command :
conda activate STRONG
It is also necessary to install the BayesPaths executable with the STRONG conda:
cd BayesPaths
python ./setup.py install
And also DESMAN:
cd ../DESMAN
python ./setup.py install
BayesPaths uses precompiled executables in the runfg_source directory. These are only compatible with Linux x86-64 and on other platforms they will require compilation from source see the BayesPaths repo for details.
Unfortunately there is a bug in the conda CONCOCT package caused by updates to Pandas this needs to be fixed before running the pipeline:
CPATH=`which concoct_refine`
sed -i 's/values/to_numpy/g' $CPATH
sed -i 's/as_matrix/to_numpy/g' $CPATH
sed -i 's/int(NK), args.seed, args.threads)/ int(NK), args.seed, args.threads, 500)/g' $CPATH
There is a bug in the current conda install of R where the lapack library while being present is not exactly where it should be for all required library to work. It is easily fixed with symbolic link
ln -s $CONDA_PREFIX/lib/R/modules/lapack.so $CONDA_PREFIX/lib/R/modules/libRlapack.so
We will also need a version of the COG database installed. We make this available for download and it can be placed anywhere. Here we point the DB_PATH variable to its location which should be chosen appropriately:
export DB_PATH=/path/to_my/database
cd $DB_PATH
wget https://microbial-metag-strong.s3.climb.ac.uk/rpsblast_cog_db.tar.gz
tar -xvzf rpsblast_cog_db.tar.gz
rm rpsblast_cog_db.tar.gz
GTDB is used in the last part of the pipeline as for MAG classification optionally. If the a gtdb path is given in the config file, STRONG will check naively for its presence and will download it if it is absent. We recommand preinstalling it, the actual download may take a while:
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release95/95.0/auxillary_files/gtdbtk_r95_data.tar.gz
tar xvzf gtdbtk_r95_data.tar.gz
rm -r db
mv release95 db
Some issues may crop up with R libraries and/or forgotten installation step. This can be checked for by running SnakeNest/scripts/check_on_dependencies.py
STRONG has a lot of required software, at the moment we recommend using the conda recipe above.
First we will download a fairly simple synthetic test data set from known microbial strains into another directory /mypath/torunthings/STRONG_Runs that we will use for STRONG output:
export SRPATH=/mypath/torunthings/STRONG_Runs
mkdir $SRPATH
cd $SRPATH
wget https://microbial-metag-strong.s3.climb.ac.uk/Test.tar.gz
tar -xvzf Test.tar.gz
rm Test.tar.gz
We are now ready to run STRONG from within the STRONG directory. Two example yamls are provided in the SnakeNest directory, for a high quality run of real data start from config.yaml but for this simple example use test_config.yaml which assumes a maximum of 5 strains per MAG as explained below. This file will need to be edited though. The following edits are necessary:
data: /mypath/torunthings/STRONG_Runs/Test
The cog_database field to:
cog_database: /path/to_my/database/rpsblast_cog_db/Cog
The evaluation genomes field which contains the known genomes to validate to
genomes: /mypath/torunthings/STRONG_Runs/Test/Eval
For real data this step would be deactivated by setting 'execution: 0'