bichangwei / PMAT

An efficient assembly tool for plant mitochondrial genome
28 stars 7 forks source link

Question about size of the resulting fasta file of mitochondrial genome sequencing #22

Open Ponerinae opened 5 months ago

Ponerinae commented 5 months ago

Dear Changwei,

I was using autoMito to generate the 2 gfa files (raw and master) of the assembled Malus Domestica genome (as in Demo2) with the code as follows:

~/PMAT-1.5.3/bin/PMAT autoMito -i Malus_domestica.540Mb.fa -o ./out.all -st hifi -g 703m -mm -tp all -cpu 20

and after using ll command I found the size of the 2 files are all around 500000b:

-rw-rw-r-- 1 526388 2024-07-02 21:06:48 PMAT_mt_master.gfa -rw-rw-r-- 1 557590 2024-07-02 21:06:48 PMAT_mt_raw.gfa

The contigs included in raw.gfa are:

1 2 3 2159 4834 15388 1233

However, the reference mitochondrial sequence data for Apple I downloaded from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/NC_018554.1/) is only 403000b in size, and the obtained raw fasta file contains many contigs that are not included in the reference sequence, e.g. contig 4834. So I copied and pasted the contig into NCBI's search engine, and found that this contig actually belongs to apple's chloroplast genome. In other words, the autoMito command I used earlier caused chloroplast sequences to get included in the mt gfa file, which is supposed to contain only mitochondrial genome.

Do you have any clue on this?

Thank you very much!

bichangwei commented 5 months ago

Thank you very much for using PMAT. Regarding your question, the "-tp" parameter is used to set the type of organelle to be assembled. If you only want to assemble the mitochondrial genome, please use "-tp mt". The "-tp all" parameter you are currently using will assemble both chloroplast and mitochondrial genomes.

Ponerinae commented 5 months ago

Thank you very much for your reply! But I tried -tp mt as well, but the resulting mt_raw and mt_master gfa still have the issue remained (they are supposed to be only mitochondrial genome, but included chloroplast contigs).

1

The code I used is: ~/PMAT-1.5.3/bin/PMAT autoMito -i Malus_domestica.540Mb.fa -o ./out.all -st hifi -g 703m -mm -tp mt -cpu 20

The comparative alignment plot I generated using Mummer (mt_raw vs reference sequence) is as follows:

output_prefix (as shown in the plot, there are several contigs not included in reference sequence, e.g. 4834, 1233)