cguyomar / MinYS

MineYourSymbiont : Targetted genome assembly for metagenomics pipeline.
GNU Affero General Public License v3.0
11 stars 1 forks source link

MinYS - MineYourSymbiont

MinYS allows targeted assembly of bacterial genomes using a reference-guided pipeline. It consists in 3 steps :

MinYS was developed in the GenScale lab by :

Requirements

Installation

With Conda

conda install -c bioconda minys

From source

git clone https://github.com/cguyomar/MinYS
cd MinYS
make -C graph_simplification/nwalign/
./MinYS.py

Test run

MinYS.py -1 test_data/reads.1.fq -2 test_data/reads.2.fq -ref test_data/ref.fa -out MinYS_results
# look at the output file:
head MinYS_results/gapfilling/minia_k31_abundancemin_auto_filtered_400_gapfilling_k31_abundancemin_auto.simplified.gfa
# should contain only one sequence node of 15,722 bp, or see also the logs:
more MinYS_results/logs/simplification.log

Options

[main options]:
  -in                   (1 arg) :    Input reads file
  -1                    (1 arg) :    Input reads first file
  -2                    (1 arg) :    Input reads second file
  -fof                  (1 arg) :    Input file of read files (if paired files, 2 columns tab-separated)
  -out                  (1 arg) :    output directory for result files [Default: ./MinYS_results]

[mapping options]:
  -ref                  (1 arg) :    Bwa index
  -mask                 (1 arg) :    Bed file for region removed from mapping

[assembly options]:
  -minia-bin            (1 arg) :    Path to Minia binary (if not in $PATH
  -assembly-kmer-size   (1 arg) :    Kmer size used for Minia assembly (should be given even if bypassing minia assembly step, usefull knowledge for gap-filling) [Default: 31]
  -assembly-abundance-min
                        (1 arg) :    Minimal abundance of kmers used for assembly [Default: auto]
  -min-contig-size      (1 arg) :    Minimal size for a contig to be used in gap-filling [Default: 400]

[gapfilling options]:
  -mtg-dir              (1 arg) :    Path to MindTheGap build directory (if not in $PATH)
  -gapfilling-kmer-size
                        (1 arg) :    Kmer size used for gap-filling [Default: 31]
  -gapfilling-abundance-min
                        (1 arg) :    Minimal abundance of kmers used for gap-filling [Default: auto]
  -max-nodes            (1 arg) :    Maximum number of nodes in contig graph [Default: 300]
  -max-length           (1 arg) :    Maximum length of gap-filling (nt) [Default: 50000]

[simplification options]:
  -l                    (1 arg) :    Length of minimum prefix for node merging, default should work for most cases [Default: 100]

[continue options]:
  -contigs              (1 arg) :    Contigs in fasta format - override mapping and assembly
  -graph                (1 arg) :    Graph in h5 format - override graph creation

[core options]:
  -nb-cores             (1 arg) :    Number of cores [Default: 0]

Documentation

A step by step tutorial of the analysis of one sample presented in the paper is available as a Jupyter notebook (or in markdown).

Utility scripts :

Some utility scripts are supplied along with MinYS in order to facilitate the post processing of the output gfa graph :

Reference

MinYS: Mine Your Symbiont by targeted genome assembly in symbiotic communities. Guyomar C, Delage W, Legeai F, Mougel C, Simon JC, Lemaitre C. NAR Genomics and Bioinformatics 2020, 2(3):lqaa047, doi:10.1093/nargab/lqaa047