RemiAllio / MitoFinder

MitoFinder: efficient automated large-scale extraction of mitogenomic data from high throughput sequencing data
86 stars 14 forks source link

Varying Lengths and Missing Genes in Outputs #51

Open Alexyates103 opened 1 year ago

Alexyates103 commented 1 year ago

I am trying to assemble whole mitogenomes in MitoFinder using MetaSPAdes as the assembler. I have clean pair ended UCE data of 4 octocoral species (90 specimens for each species) and am using a reference genome of one of the species that is also closely related to the other 3 species from GenBank that is 18,947 bp long, however when using mitofinder, my outputs vary in length. Roughly half of my assemblies do not contain all the 16 genes in my outputs and for the other half that do have every gene accounted for, they vary in length. Some are ~24,000 bp long, whereas some are ~ 15,000 bp and I am expecting them to be the similar lengths to the reference.

The main problems I’m unsure about are why do my assembly outputs vary so much in length? I am wanting to create whole genomes for each specimen. For the assemblies > 20,000 bp, how can I tell which of these data is additional (potential duplicate information), and thus remove it so I am able to create reliable whole mitogenomes? For the genomes that are smaller than the reference genome, what reasons could be behind why they are missing information in the final outputs?

I can upload any files if that would help but I would really appreciate feedback on what could be causing these issues in my outputs as I am confused and not sure why I’m seeing this.

Thank you very much!