Multi-genome synteny detection using a dynamic minimizer graph approach.
ntSynt can take multiple genomes as input, and will compute synteny blocks that are in common with each of these input assemblies. ntSynt builds on the ntJoin codebase.
Main steps in the algorithm:
Concept: Lauren Coombe and Rene Warren
Design and implementation: Lauren Coombe
If you use ntSynt in your work, please cite:
Lauren Coombe, Parham Kazemi, Johnathan Wong, Inanc Birol, René L. Warren. Multi-genome synteny detection using minimizer graph mappings. bioRxiv (2024) https://doi.org/10.1101/2024.02.07.579356.
usage: ntSynt [-h] [--fastas_list FASTAS_LIST] -d DIVERGENCE [-p PREFIX] [-k K] [-w W] [-t T] [--fpr FPR] [-b BLOCK_SIZE] [--merge MERGE]
[--w_rounds W_ROUNDS [W_ROUNDS ...]] [--indel INDEL] [-n] [--benchmark] [-f] [--dev] [-v]
[fastas ...]
ntSynt: Multi-genome synteny detection using minimizer graphs
positional arguments:
fastas Input genome fasta files
optional arguments:
-h, --help show this help message and exit
--fastas_list FASTAS_LIST
File listing input genome fasta files, one per line
-d DIVERGENCE, --divergence DIVERGENCE
Approx. maximum percent sequence divergence between input genomes (Ex. -d 1 for 1% divergence).
This will be used to set --indel, --merge, --w_rounds, --block_size
See below for set values - You can also set any of those parameters yourself, which will override these settings.
-p PREFIX, --prefix PREFIX
Prefix for ntSynt output files [ntSynt.k<k>.w<w>]
-k K Minimizer k-mer size [24]
-w W Minimizer window size [1000]
-t T Number of threads [12]
--fpr FPR False positive rate for Bloom filter creation [0.025]
-b BLOCK_SIZE, --block_size BLOCK_SIZE
Minimum synteny block size (bp)
--merge MERGE Maximum distance between collinear synteny blocks for merging (bp).
Can also specify a multiple of the window size (ex. 3w)
--w_rounds W_ROUNDS [W_ROUNDS ...]
List of decreasing window sizes for synteny block refinement
--indel INDEL Threshold for indel detection (bp)
-n, --dry-run Print out the commands that will be executed
--benchmark Store benchmarks for each step of the ntSynt pipeline
-f, --force Run all ntSynt steps, regardless of existing output files
--dev Run in developer mode to retain intermediate files, log verbose output
-v, --version show program's version number and exit
Given the approximate maximum divergence between the supplied genomes, ntSynt will set these default parameters: | Divergence range | Default parameters |
---|---|---|
< 1% | --block_size 500 --indel 10000 --merge 10000 --w_rounds 100 10 | |
1% - 10% | --block_size 1000 --indel 50000 --merge 100000 --w_rounds 250 100 | |
>10% | --block_size 10000 --indel 100000 --merge 1000000 --w_rounds 500 250 |
Any of these parameters can be overridden by specifying them in your command. While these settings work generally well for the associated divergence range, we highly recommend customizing them for your particular requirements.
conda install -c bioconda -c conda-forge ntsynt
meson setup build --prefix=/path/to/desired/install/location
cd build
ninja install
Test your ntSynt installation using our provided demo:
cd tests
./run_ntSynt_demo.sh
Once the script has executed successfully, you can compare the output files with those in tests/expected_results
To compute the synteny blocks between 3 assemblies (assembly1.fa, assembly2.fa, assembly3.fa) with default parameters, where the maximum sequence divergence among these is ~5%, run:
ntSynt -d 5 assembly1.fa assembly2.fa assembly3.fa
The main output file has the naming scheme <prefix>.synteny_blocks.tsv
. This contains the synteny blocks computed in a TSV format.
The columns of this output synteny blocks TSV:
For a basic statistical summary of the computed synteny blocks, you can use the script denovo_synteny_block_stats.py
found in analysis_scripts
:
python3 denovo_synteny_block_stats.py -h
usage: denovo_synteny_block_stats.py [-h] --tsv TSV --fai FAI [FAI ...]
Compute de novo stats on synteny blocks
optional arguments:
-h, --help show this help message and exit
--tsv TSV ntSynt synteny block file
--fai FAI [FAI ...] FAI files for the compared genomes
More information can be found on our wiki page
ntSynt Copyright (c) 2023 British Columbia Cancer Agency Branch. All rights reserved.
ntSynt is released under the GNU General Public License v3
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
For commercial licensing options, please contact Patrick Rebstein prebstein@bccancer.bc.ca