MinYS allows targeted assembly of bacterial genomes using a reference-guided pipeline. It consists in 3 steps :
MinYS was developed in the GenScale lab by :
conda install -c bioconda minys
git clone https://github.com/cguyomar/MinYS
cd MinYS
make -C graph_simplification/nwalign/
MinYS.py -1 test_data/reads.1.fq -2 test_data/reads.2.fq -ref test_data/ref.fa -out MinYS_results
# look at the output file:
head MinYS_results/gapfilling/minia_k31_abundancemin_auto_filtered_400_gapfilling_k31_abundancemin_auto.simplified.gfa
# should contain only one sequence node of 15,722 bp, or see also the logs:
more MinYS_results/logs/simplification.log
[main options]:
-in (1 arg) : Input reads file
-1 (1 arg) : Input reads first file
-2 (1 arg) : Input reads second file
-fof (1 arg) : Input file of read files (if paired files, 2 columns tab-separated)
-out (1 arg) : output directory for result files [Default: ./MinYS_results]
[mapping options]:
-ref (1 arg) : Bwa index
-mask (1 arg) : Bed file for region removed from mapping
[assembly options]:
-minia-bin (1 arg) : Path to Minia binary (if not in $PATH
-assembly-kmer-size (1 arg) : Kmer size used for Minia assembly (should be given even if bypassing minia assembly step, usefull knowledge for gap-filling) [Default: 31]
(1 arg) : Minimal abundance of kmers used for assembly [Default: auto]
-min-contig-size (1 arg) : Minimal size for a contig to be used in gap-filling [Default: 400]
[gapfilling options]:
-mtg-dir (1 arg) : Path to MindTheGap build directory (if not in $PATH)
(1 arg) : Kmer size used for gap-filling [Default: 31]
(1 arg) : Minimal abundance of kmers used for gap-filling [Default: auto]
-max-nodes (1 arg) : Maximum number of nodes in contig graph [Default: 300]
-max-length (1 arg) : Maximum length of gap-filling (nt) [Default: 50000]
[simplification options]:
-l (1 arg) : Length of minimum prefix for node merging, default should work for most cases [Default: 100]
[continue options]:
-contigs (1 arg) : Contigs in fasta format - override mapping and assembly
-graph (1 arg) : Graph in h5 format - override graph creation
[core options]:
-nb-cores (1 arg) : Number of cores [Default: 0]
or -mtg-dir
and -graph
may be used to bypass the mapping/assembly step, and the graph creation, respectively.
In the first case, -assembly-kmer-size
should be supplied as the overlap between contigs.HDF5_USE_FILE_LOCKING
to 'FALSE'.A step by step tutorial of the analysis of one sample presented in the paper is available as a Jupyter notebook (or in markdown).
Some utility scripts are supplied along with MinYS in order to facilitate the post processing of the output gfa graph :
graph_simplification/enumerate_paths.py in.gfa out_dir
Enumerate all the paths of each connected component of a graph. Returns paths that are substantially different from one another (ANI < 99\% or alignment coverage <99\%)
graph_simplification/filter_components.py in.gfa min_size
Return a sub-graph containing all the connected components larger than min_size
(in total assembled nt)
graph_simplification/gfa2fasta.py in.gfa out.fa
Return all the sequences of the graph in a multi-fasta file
MinYS: Mine Your Symbiont by targeted genome assembly in symbiotic communities. Guyomar C, Delage W, Legeai F, Mougel C, Simon JC, Lemaitre C. NAR Genomics and Bioinformatics 2020, 2(3):lqaa047, doi:10.1093/nargab/lqaa047