MariaNattestad / Assemblytics

Assemblytics is a bioinformatics tool to detect and analyze structural variants from a genome assembly by comparing it to a reference genome.
http://assemblytics.com
MIT License
134 stars 28 forks source link

'Potentially' the same variant called as a deletion and an tandem expansion #31

Closed AyushSaxena closed 4 years ago

AyushSaxena commented 4 years ago

Dear Maria,

We have genotyped many similar genotypes of C. elegans using PacBio sequencing (Mutation Accumulation lines diverged over 500 generations; ~200 SNPs worth of genetic distance between each pair, so nearly identical). Now we're trying to estimate how many SVs exist between these very two similar genotypes. This is a tricky problem, because in an MA experiment, all mutations are new/unique, hence singletons, with no other biological confirmation in other genotypes. Since all false positives could be unique to a genotype as well, I am struggling to get an accurate estimate of the false positive rate (as our estimates of mutation rate are extremely sensitive to false positives because the real number of mutations could be so few).

I am pasting the outcome of 'bedtools intersect' between two of our MA lines to find common SV calls between those two (pasting just the problematic row). We expect the MA lines to share variants that are common to their ancestor, and everything unique is potentially a new mutation (MA530 v/s MA563).

II 10430559 10450529 Assemblytics_b_67 31 + Tandem_expansion -19970 -19939 tig00000036:1055793-1075732:- between_alignments II 10440530 10440560 Assemblytics_w_3 30 + Deletion 30 0 tig00000057:306002-306002:+ within_alignment 30

When I compare MA530 to the common ancestor, I recover the mutation (MA530 v/s Ancestor).

II 10430559 10450529 Assemblytics_b_67 31 + Tandem_expansion -19970 -19939 tig00000036:1055793-1075732:- between_alignments II 10440530 10440560 Assemblytics_w_24 30 + Deletion 30 0 tig00000029:255849-255849:+ within_alignment 30

There are two possibilities -

  1. The 30bp deletion exists in the ancestor, which was also recovered in MA563, and we have found a new mutation (tandem expansion) in MA530, or

  2. This represents a form of assembly or mapping artifact.

I have a suspicion that these two could represent the same mutation. Am I onto something, or is this a co-incidence? I have found other such pairs of Tandem contraction matching up with Insertions as well. These parts of the genome could be prone to SVs as well, so there's no telling what's true!

As a follow up, I couldn't really understand your strategy of estimating false positive rate in the manuscript. I have a suspicion that we may be calling ~30-40 false positives in each sample per genome, which isn't a big deal when we have 5000 SVs, but it could completely overwhelm the signal if the real number of variants are only ~10. Do you have a recommendation?

Thank you once again for writing such a wonderful tool and following up on all my queries!

Ayush

MariaNattestad commented 4 years ago

Hi Ayush

When two different contigs cover the same sequence and therefore show the same variant, it will be called twice. They can be different types when one contig has a single alignment encapsulating the event (insertion) while the other has two alignments overlapping there (tandem expansion). A tandem expansion and a deletion do not make sense as the same event though because one is an increase in sequence while the other is a decrease.

In general though, Assemblytics is really only finding and categorizing overlaps and gaps between alignments (and within alignments for insertions and deletions). The number of variants as well as false positives and negatives depends a lot on the assembly and mapping, as well as the actual number of structural variants. Assemblytics only finds the gaps, but it can't guess whether biology or errors in the assembly created those gaps. If you have very few events, it is probably best to look at the alignments by eye as well. Mummer has other great tools for this.

Best of luck with your research! Maria