MariaNattestad / Assemblytics

Assemblytics is a bioinformatics tool to detect and analyze structural variants from a genome assembly by comparing it to a reference genome.
http://assemblytics.com
MIT License
135 stars 28 forks source link

Repeat Contractions and Expansions are the most frequent call in a bacterial genome #54

Closed derekcg closed 1 year ago

derekcg commented 1 year ago

Hello Dr. Maria Nattestad,

I am a PhD student researching Mycobacterium tuberculosis genomes with long read sequencing. Assemblytics is one of the variant callers we are using to study structural variants. However the majority of variants called by Assemblytics in our de novo assembled genomes are “repeat contractions” or repeat contractions.” We think that the majority of the genome is being called repetitive elements by Assemblytics. This could be due to the esx, PE, and PPE gene families, three large and highly paralogous gene families that pepper the M. tuberculosis genome. They might make it rare to find 10kb long stretches that entirely map unambiguously to one region of the genome. Unfortunately the repeat contraction and repeat expansion calls don’t report where the implied variants are within the repetitive elements. In many cases Assemblytics reports repeat contractions and expansions with coordinates 20kb apart, and a size of only 1000 bp. Presumably these are 1000 bp insertions and deletions within 20 kb regions that assemblytics labeled repetitive elements. In these cases we have no idea where the 1000 bp deletion within this 20kb region is, making it difficult to determine which genes in that region the variant might effect.

Could a parameter be added to Assemblytics to adjust the threshold 10kb minimum anchor length? That may allow Assembytics to find structural variants inside non-repetitive regions smaller than 10kb that are between repetitive genes.

Alternatively is there any way for Assemblytics to determine where the structural variant(s) implied by a repeat contraction or expansion might be within the repetitive element?

We think Assemblytics is a great tool for finding structural variants in de novo assembled genomes. And thank you and your team for publishing it.

Thank you for your time

Regards, Derek Conkle-Gutierrez JDP Global Health Student, SDSU and UCSD Student Research Assistant, Laboratory for Pathogenesis of Clinical Drug Resistance and Persistence Pronouns: he/him

derekcg commented 1 year ago

I apologize, there already is a parameter to adjust the unique sequence threshold. We just misunderstood the parameter.