Yukihirokinjo / TCSF_IMRA

Improving de novo assembly of endosymbiont genomes.
2 stars 0 forks source link

non-arbitrary linearization of circular assemblies #4

Open AlesBucek opened 1 year ago

AlesBucek commented 1 year ago

Dear Kinjo-san, TCSF_IMRA is working great for mt genome assembly but relatively often the resulting mitochondrial contig is arbitrarily linearized in a different part than where was linearized the reference mitochondrial genome. I understand why this is happening and that it is the behavior of the assemblers and not a feature of the TCSF_IMRA but I'm wondering whether you have maybe previously tried to address this behavior and enforce the point of linearization of circular assemblies?

My current workflow is to check for circularity of mt contigs and then re-linearize the contigs so all the mt genomes are linearized in the same region. But this approach sometimes does not recognize circular assemblies or can introduce frameshifts and it generally requires a lot of manual curation. Thanks!

Yukihirokinjo commented 1 year ago

Hi Ales-san,

Sorry for the late reply. I've been very busy with the KAKENHI proposal this week.

Regarding the linearization problem, I actually have a script to address that. The best solution would probably be implementing this re-linearization procedure within the IMRA pipeline. If you can wait for a week or two, I will update IMRA with this option. Is that OK with you?

By the way, I've noticed that SPAdes tends to introduce some mis-assemblies when we don't specify the '--careful' option. So, I've updated IMRA to work with this option.

Best, Kinjo