Kinggerm / GetOrganelle

Organelle Genome Assembly Toolkit (Chloroplast/Mitocondrial/ITS)
GNU General Public License v3.0
280 stars 54 forks source link
assembly chloroplast fastg genome-skimming-data its mitochondria mitogenome plastome ribosomal

GetOrganelle

Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge GitHub release European Galaxy server

notice: please update to 1.7.5+, which fixed the bug on the multiplicity estimation of self-loop vertices.

This toolkit assemblies organelle genome from genomic skimming data.

It achieved the best performance overall both on simulated and real data and was recommended as the default for chloroplast genome assembly in a third-party comparison paper (Freudenthal et al. 2020. Genome Biology).

Please denote the version of GetOrganelle as well as the dependencies in your manuscript for reproducible science.

Citation: Jian-Jun Jin, Wen-Bin Yu, Jun-Bo Yang, Yu Song, Claude W. dePamphilis, Ting-Shuang Yi, De-Zhu Li. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biology 21, 241 (2020). https://doi.org/10.1186/s13059-020-02154-5

License: GPL https://www.gnu.org/licenses/gpl-3.0.html

Please also cite the dependencies if used:

SPAdes: Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A. and Korobeynikov, A. 2020. Using SPAdes de novo assembler. Current protocols in bioinformatics, 70(1), p.e102.

Bowtie2: Langmead, B. and S. L. Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9: 357-359.

BLAST+: Camacho, C., G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer and T. L. Madden. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10: 421.

Bandage: Wick, R. R., M. B. Schultz, J. Zobel and K. E. Holt. 2015. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31: 3350-3352.

Installation & Initialization

GetOrganelle is currently maintained under Python 3.7.0, but designed to be compatible with versions higher than 3.5.1 and 2.7.11. It was built for Linux and macOS. Windows Subsystem Linux is currently not supported, we are working on this.

Test

Download a simulated Arabidopsis thaliana WGS dataset:

wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.1.fq.gz
wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.2.fq.gz

then verify the integrity of downloaded files using md5sum:

md5sum Arabidopsis_simulated.*.fq.gz
# 935589bc609397f1bfc9c40f571f0f19  Arabidopsis_simulated.1.fq.gz
# d0f62eed78d2d2c6bed5f5aeaf4a2c11  Arabidopsis_simulated.2.fq.gz
# Please re-download the reads if your md5 values unmatched above

then do the fast plastome assembly (memory: ~600MB, CPU time: ~60s):

get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10

You are going to get a similar running log as here and the same result as here.

Find more real data examples at GetOrganelle/wiki/Examples, GetOrganelleGallery and GetOrganelleComparison.

Instruction

Find more organelle genome assembly instruction at GetOrganelle/wiki.

In most cases, what you actually need to do is just typing in one simple command as suggested in Recipes. But you are still highly recommended reading the following minimal introductions:

Starting from Reads

The green workflow in the flowchart below shows the processes of get_organelle_from_reads.py.

Starting from Assembly

The blue workflow in the chat below shows the processes of get_organelle_from_assembly.py.

GetOrganelle flowchart

flowchart

Recipes

Please refer to the GetOrganelle FAQ to fine-tune the arguments, especially concerning word size, memory, and clock time.

From Reads

From Assembly Graph

There are as many available organelle types as the From Reads section (see more by get_organelle_from_assembly.py -h), but the simplest usage is not that different. Here is an example to extract the plastid genome from an existing assembly graph (*.fastg/*.gfa; e.g. from long-read sequencing assemblies):

get_organelle_from_assembly.py -F embplant_pt -g ONT_assembly_graph.gfa

Arguments

See a brief illustrations of those arguments by typing in:

get_organelle_from_reads.py -h

or see the detailed illustrations:

get_organelle_from_reads.py --help

The same brief -h and verbose --help menu can be find for get_organelle_from_assembly.py.

You may also find a summary of above information here at Usage.

Contact

Please check GetOrganelle wiki page first. If your question is running specific, please attach the get_org.log.txt file and the post-slimming assembly graph (assembly_graph.fastg.extend_*.fastg, could be Bandage-visualized *.png format to protect your data privacy).

Although older versions like 1.6.3/1.7.1/1.7.6 may be more stable, but we always strongly encourage you to keep updated. GetOrganelle was actively updated with new fixes and new features, but new bugs too. So if you catch one, please do not be surprised and report it to us. We usually have quick response to bugs.

Do NOT directly write to us with your questions, instead please post the questions publicly, using above platforms (we will be informed automatically) or any other platforms (inform us of it). Our emails (jianjun.jin@columbia.edu, yuwenbin@xtbg.ac.cn) are only for receiving public question alert and private data (if applied) associated with those public questions. When you send your private data to us, enclose the email with a link where you posted the question. Our only reply emails will be a receiving confirmation, while our answers will be posted in a public place.