RemiAllio / MitoFinder

MitoFinder: efficient automated large-scale extraction of mitogenomic data from high throughput sequencing data
86 stars 14 forks source link

Not finding the mito in assembled CLR data #45

Open aureliendejode opened 1 year ago

aureliendejode commented 1 year ago

Hello,

I tried to find the mitochondrial sequences in my assembly. I assembled CLR PacBio reads using CANU for a mollusc genome. As reference I used 2 mitochondrial genomes of the same species and I used the genetic code 5 (Mito Invertebrates), but Mitofinder said that it could not find mitochondrial sequence in contigs less than 25 000 bp.

Does anyone know what is happening ? Is it better to use MitoFinder on the reads for that particular case ?

Thanks for your help

Aurélien

RemiAllio commented 1 year ago

Hello Aurélien,

There are two possibilities here. Either CANU discarded the mitochondrial sequences due to their suspicious coverage (i.e. really high compared to genomic sequences), or some chimeric sequences have been created by CANU. In that case, you can allow MitoFinder to search for longer sequences (--max-contig-size option) and see if several mitogenomes have been concatenated during the assembly step. Unfortunately, MitoFinder is not designed yet to handle long-reads data. Starting from reads is therefore not possible in your case. You can try to assemble your reads with an alternative assembler ...

Sorry for the inconvenience, Best regards, Rémi

aureliendejode commented 1 year ago

Hello Rémi,

Thanks for this I was able to fetch the contigs but MitoFinder in the raw CANU assembly. My guess is that it was probably eliminated by purge_dups because it has a filter on coverage. however, MitoFinder could not circularize it. I there something i can do to help the circularization ?

Aurélien

aureliendejode commented 10 months ago

Hello Rémi,

I was wondering if you would have any insights about why MitoFinder was unable to circularize the mitochondrial dna ?

Best Aurélien