JiaoLab2021 / SynDiv

A tool for quick and accurate calculation of syntenic diversity.
MIT License
26 stars 0 forks source link

SynDiv

GitHub last commit Build Status

Introduction

A tool for quick and accurate calculation of syntenic diversity.

Requirements

Please note the following requirements before building and running the software:

Installation

Install via conda

conda create -n syndiv
conda activate syndiv
# Install SynDiv with all dependencies
conda install -c bioconda -c conda-forge -c duzezhen syndiv

Building on Linux

Use the following script to build the software:

  1. First, obtain the source code.
git clone https://github.com/JiaoLab2021/SynDiv.git
cd SynDiv
  1. Next, compile the software and add the current directory to your system's PATH environment variable.
cmake ./
make
chmod +x SynDiv.py SynDiv_p.py genome2SynDiv_config.py cal_Syn_Fst.py gene_Syn_Fst.py
ln -sf SynDiv.py SynDiv
ln -sf SynDiv_p.py SynDiv_p
ln -sf genome2SynDiv_config.py genome2SynDiv_config
ln -sf cal_Syn_Fst.py cal_Syn_Fst
ln -sf gene_Syn_Fst.py gene_Syn_Fst
echo 'export PATH="$PATH:'$(pwd)'"' >> ~/.bashrc
source ~/.bashrc
  1. Assuming that you have installed all the required software dependencies, please make sure they have been added to your environment path or activated in the corresponding code environment. If you haven't installed them yet, you can use the following code to install all the dependencies:
conda create -n syndiv
conda activate syndiv
# Install software using conda
conda install samtools minimap2
  1. To verify that the software has been installed correctly, perform a test run using the following steps:
SynDiv -h
SynDiv_p -h
SynDiv_c -h
samtools
minimap2
# test
cd test
nohup /usr/bin/time -v SynDiv -r genome/refgenome.fa -c configuration.txt &>log.txt &

Usage

Input Files

To quickly get started, you will need two input files: aligns files and syri.out files. Once you have obtained these files, make sure to prepare the Reference genome and configuration file.

Please note that the chromosome names in the query genome must match those in the reference genome.

Sample names and file paths should not contain special characters, especially _.

# configuration file
sample1 sample1.fa sample1.aligns sample1.syri.out
sample2 sample2.fa sample2.aligns sample2.syri.out
...
sampleN sampleN.fa sampleN.aligns sampleN.syri.out

File should be separated by tabs. The code examples for generating aligns and syri.out files can be found on the wiki.

Running

Before running the software, it is recommended to set the maximum number of open files using the ulimit -n <number> command. The maximum number of open files can be calculated based on the number of genomes (n) and the number of threads (t) using the following formula:

number = 10 + t*(2n - t - 1)

For convenience, let's assume the following file names for the input:

One-Click Generation

ulimit -n 50000
SynDiv -r refgenome.fa -c configuration.txt &

Note

See the wiki for step-by-step manual execution of SynDiv and calucate Syn-Fst.

Citation

Please cite:

License

MIT