Kinggerm / GetOrganelle

Organelle Genome Assembly Toolkit (Chloroplast/Mitocondrial/ITS)
GNU General Public License v3.0
275 stars 52 forks source link

Regarding the Use of Incomplete Plant Mitochondrial Genome Assembly Outputs for Subsequent Analysis #317

Open twdawd opened 8 months ago

twdawd commented 8 months ago

Thank you very much for making this software. I have a question: When I assembled the plant mitochondrial genome, I got a file named embplant_mt.K105.scaffolds.graph1.1.path_sequence.fasta. Can I use the generated file(fasfa file) in the next step of analysis such as gene annotation and phylogeny? Looking forward to your reply

JianjunJin commented 8 months ago

The assembly quality of incomplete organelle genomes (scaffolds) cannot be guaranteed in the current version of GetOrganelle (1.7.7.0 or earlier). However, enhancements were made in version 1.8.0 (although not officially released yet, see https://github.com/Kinggerm/GetOrganelle/tree/intermediate_graph) for chloroplast and animal mitochondrial genomes.

The primary concern of directly using the plant mitochondrial assembly lies in the potential inclusion of chloroplast loci, due to the similarity between pt and mt and the false hits of blast. This becomes severe when the assembly is incomplete because depth and graph information can barely be utilized to help identify contigs. Therefore, it is advisable to run get_organelle_from_assembly.py using the current output, implementing stricter contig filtering measures, which can be quick and easy. The following command is an example proposed for this purpose. Still, you are encouraged to carefully assess the possibility of chloroplast inclusion in mitochondrial assemblies even after this.

 get_organelle_from_assembly.py -F embplant_mt --slim-options "--perc-identity 99.0 -e 1e-250" -g your_old_output/filtered_K105.assembly_graph.fastg.extend-embplant_mt-embplant_pt.fastg -o further_purified_result
twdawd commented 8 months ago

I understand !!! Thank you !!!