RemiAllio / MitoFinder

MitoFinder: efficient automated large-scale extraction of mitogenomic data from high throughput sequencing data
88 stars 15 forks source link

Annotation discrepancies #8

Closed sihellem closed 4 years ago

sihellem commented 4 years ago

Dear Rémi,

I hope you are well.

We have been making some tests and comparisons between annotations retrieved by MitoFinder to those from MITOS2, and found some discrepancies.

We assembled reads using metaSPAdes, by modifying lines #96 and #98 from runMetaspades.py to specify k-mers (21, 31, 41, 51, 71 and 91).

Here is the annotation performed directly by MitoFinder: 1191_mtDNA_contig.txt

And here, the one performed by MITOS2 by inputting the same contig:

mitofinder_to_mitos

As you can see, MitoFinder fails at retrieving tRNA-Trp and tRNA-Cys between ND2 and COX1; and tRNA-Val between rrnL and rrnS. In general, it also appears that there is a slight variation for START and STOP positions.

For further comparison, we ran metaSPAdes (from SPAdes 3.11.1) installed separately from MitoFinder, using the same parameters as for the assembly performed by MitoFinder. The largest contig was submitted to MITOS, with results as follows:

metaspades_to_mitos

As you can see, annotation is consistent with the previous scaffold produced by MitoFinder and fed to MITOS, with the exception that the former one has a lot of internal stops, which is not the case when using metaSPAdes outside of MitoFinder.

Would the observed discrepancies be linked to the assembly step?

Additionally, we see that the default option of --circular-size is set to 45. Is it meant to be 45kb? Knowing that our model has size of roughly 16kb, would we benefit to changing this option?

Thanks a lot in advance for you reply, Simon

RemiAllio commented 4 years ago

Hello Simon,

First of all, MitoFinder was primarily designed to facilitate the assembly and the extraction of the Mitochondrial signal from NGS data. The annotation step is a bonus that we considered useful and worked well in our cases. I am a bit surprised by the results you found here. Thank you for reporting them!

To be able to answer to your different questions correctly, I need more details. First, are there internal stop codons in the file 1191_mtDNA_contig_genes_AA.fasta? Second, did you specify the organism type (mitochondrial code) in either MitoFinder or MITOS2? Third, did you try to annotate the assembled contig using metaSPAdes outside of MitoFinder with MitoFinder using the -a option ? Finally, how close is the reference used in MitoFinder to your focal species ?

Cheers, Rémi

sihellem commented 4 years ago

Dear Rémi,

Thanks a lot for your prompt answer!

To answer your questions: 1) Internal stop codons (*) in 1191_mtDNA_contig_genes_AA.fasta do not occur. 2) We specified the code 5 (invertebrate mitochondrial) in both MitoFinder and MITOS2. 4) We used all RefSeq mitogenomes (n=31), among which several Kalotermitidae are represented, and from which the mentioned sample belong to. 3) We just did, please find the results below. Same tRNAs are missing as previously. metaspades_scaffold_to_mitofinder_annotation.txt

Thank you in advance for your reply, Cheers, Simon

RemiAllio commented 4 years ago

Dear Simon,

Thank you for your answers.

If there are no internal stop codons in the file, that means that the annotation done by MITOS2 for the contig from MitoFinder failed. Why ? I don't know. That is why I asked you if you specified the genetic code in MITOS2 for this case. The only way to understand the difference between the contigs obtained either from MitoFinder or metaSPAdes (outside MF) is to align and compare them. Could you do this?

Then, for the missing tRNAs, I suspect that ARWEN (used in MitoFinder) is missing them... Sorry about that.

Finally, regarding the annotation done by MitoFinder and MITOS2, I found only few differences (that's cool). My advice is to either align your new mitogenome or the extracted genes (with different annotations) with a well annotated mitogenome of reference to see if you can choose between the two annotations. Does it make sense ?

I hope this can help you. Cheers, Rémi

sihellem commented 4 years ago

Dear Rémi,

I apologize for the delay of my answer, we had a few things going on here.

We just had a look to both these contigs: they are 100% identical, start and end at same position.

So it would really seem it is an ARWEN-linked issue. In our case, it seems we will therefore have to shift to MITOS2 for the annotation, as it yields more complete annotation compared to references.

Best regards, Simon

RemiAllio commented 4 years ago

Dear Simon,

I am a bit confused: if the two contigs are 100% identical, why MITOS2 failed to annotate the contig of MitoFinder?

Anyway, I'm happy that you've found a solution to your case. Cheers, Rémi

sihellem commented 4 years ago

Dear Rémi,

My apologies. We just noticed that for the previous comparison of annotations, we extracted the wrong contig.

For our last reply, we used two contigs (which proved to be 100% identical to each other) and made four comparisons:

  1. Contig produced by MitoFinder a. Annotated by Mitofinder: Missing tRNAs. b. Annotated by MITOS2: Complete annotation compared to references.
  2. Contig produced by metaSPAdes outside of MitoFinder. c. Annotated by Mitofinder: Missing tRNAs. d. Annotated by MITOS2: Complete annotation compared to references.

So even though we were mistaken earlier with comparison, these discrepancies in annotation still occur, and should be ARWEN-dependent.

Cheers, Simon

RemiAllio commented 3 years ago

Hi Simon,

Just a quick update: a new version of MitoFinder (v1.4) is out! I added MiTFi (default) and tRNAscan-SE for the tRNA annotation step. I hope this will improve the final annotation!

Thanks again for your feedback, Cheers, Rémi

sihellem commented 3 years ago

Dear Rémi,

Thanks for the update! I will notify our lab members.

Cheers, Simon

RemiAllio commented 3 years ago

Dear Simon,

Please do not hesitate to send us your comments on this version!

Cheers, Rémi