DyogenIBENS / SCORPIOS

SCORPiOs is a synteny-guided gene tree correction pipeline for clades that have undergone a whole-genome duplication event.
GNU General Public License v3.0
19 stars 3 forks source link

SCORPiOs - Synteny-guided CORrection of Paralogies and Orthologies

DOI License: GPL v3 Snakemake Documentation Status

SCORPiOs is a synteny-guided gene tree correction pipeline for clades that have undergone a whole-genome duplication event. SCORPiOs identifies gene trees where the whole-genome duplication is missing or incorrectly placed, based on the genomic locations of the duplicated genes across the different species. SCORPiOs then builds an optimized gene tree consistent with the known WGD event, the species tree, local synteny context, as well as gene sequence evolution.

SCORPiOs is implemented as a Snakemake pipeline. SCORPiOs takes as input either gene trees or multiple alignments, and outputs the corresponding optimized gene trees.


:sparkles: New in SCORPiOs 2.0.0: LORelEi (Lineage-specific Ohnolog Resolution Extension)
SCORPiOs LORelEi analyzes sequence-synteny conflicts in gene trees and diagnose cases of delayed meiosis resolution following WGD. To learn how to use SCORPiOs and SCORPiOs LORelEi, take a look at SCORPiOs documentation!


SCORPiOs illustrated

If you use SCORPiOs, please cite:

Parey E, Louis A, Cabau C, Guiguen Y, Roest Crollius H, Berthelot C, Synteny-guided resolution of gene trees clarifies the functional impact of whole genome duplications, Molecular Biology and Evolution, msaa149, https://doi.org/10.1093/molbev/msaa149.

Quick start

Below is a quick start guide to using SCORPiOs, we recommend reading SCORPiOs documentation for detailed instructions.

Table of content

Installation

Installing conda

The Miniconda3 package management system manages all SCORPiOs dependencies, including python packages and other software.

To install Miniconda3:

Installing SCORPiOs

Updating SCORPiOs conda environment

Usage

Setting up your working environment for SCORPiOs

Before any SCORPiOs run, you should:

Running SCORPiOs on example data

Before using SCORPiOs on your data, we recommend running a test with our example data to ensure that installation was successful and to get familiar with the pipeline, inputs and outputs.

Example 1: Simple SCORPiOs run

SCORPiOs uses a YAML configuration file to specify inputs and parameters for each run. An example configuration file is provided: config_example.yaml. This configuration file executes SCORPiOs on toy example data located in data/example/, that you can use as reference for input formats.

The only required snakemake arguments to run SCORPiOs are --configfile, the --use-conda flag abd the --scheduler=greedy option. You also need to specify the number of threads via --cores. For more advanced options, you can look at the Snakemake documentation.

To run SCORPiOs on example data:

snakemake --configfile config_example.yaml --use-conda --cores 4 --scheduler=greedy

The following output should be generated: SCORPiOs_example/SCORPiOs_output_0.nhx.

Example 2: Iterative SCORPiOs run

SCORPiOs can run in iterative mode: SCORPiOs improves the gene trees a first time, and then uses the corrected set of gene trees again as input for a new correction run, until convergence. Correcting gene trees improves orthologies accuracy, which in turn makes synteny conservation patterns more informative, improving the gene tree reconstructions after successive runs. Usually, a small number of iterations (2-3) suffice to reach convergence.

To run SCORPiOs in iterative mode on example data, execute the wrapper bash script iterate_scorpios.sh:

bash iterate_scorpios.sh --snake_args="--configfile config_example.yaml --cores 4 --scheduler=greedy"

Command-line arguments:

Required
--snake_args="SNAKEMAKE ARGUMENTS", should at minimum contain --configfile

Optional
--max_iter=MAXITER, maximum number of iterations to run, default=5.
--min_corr=MINCORR, minimum number of corrected sub-trees to continue to the next iteration, default=1.
--starting_iter=ITER, starting iteration, to resume a run at a given iteration, default=1.

The following output should be generated: SCORPiOs_example/SCORPiOs_output_2_with_tags.nhx.

Running SCORPiOs on your data

Preparing your configuration file

To run SCORPiOs on your data, you have to create a new configuration file for your SCORPiOs run. You will need to format your input data adequately and write your configuration file, using the provided example config_example.yaml as a guide.

To check your configuration, you can execute a dry-run with -n.

snakemake --configfile config.yaml --use-conda -n

Running SCORPiOs

Finally, you can run SCORPiOs as described above:

snakemake --configfile config.yaml --use-conda --cores 4 --scheduler=greedy

or in iterative mode:

bash iterate_scorpios.sh --snake_args="--configfile config.yaml --cores 4 --scheduler=greedy"

Authors

License

This code may be freely distributed and modified under the terms of the GNU General Public License version 3 (GPL v3)

References

SCORPiOs uses the following tools to build and test gene trees: